Replies: 1 comment
-
@toriving, thanks for your question. Can you please create an issue so we can investigate further? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hello.
I am training a model using Deepspeed.
I tried using the adafactor (non-native) as an optimizer.
However, it was confirmed that the loss did not decrease, and this phenomenon only occurs when using Deepspeed.
Doesn't Deepspeed currently support the Adaptive optimizer style?
Or do it not support Adafactor?
Beta Was this translation helpful? Give feedback.
All reactions