LoRa possible on MusicGen models? #786
Unanswered
DavidNTompkins
asked this question in
Q&A
Replies: 1 comment 1 reply
-
If you could please provide the code that you tried to use and a copy of the full error message, we can investigate. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Apologies if this is a silly question - but is there any reason why I couldn't use LoRa or prompt tuning on a musicgen model? I'm reading their paper (https://arxiv.org/pdf/2306.05284.pdf) and it seems like it might work? Here's the text describing their transformer model:
We train autoregressive transformer models at different sizes: 300M, 1.5B,
3.3B parameters. We use a memory efficient Flash attention [Dao et al., 2022] from the xFormers
package [Lefaudeux et al., 2022] to improve both speed and memory usage with long sequences. We
study the impact of the size of the model in Section 4. We use the 300M-parameter model for all of
our ablations. We train on 30-second audio crops sampled at random from the full track. We train
the models for 1M steps with the AdamW optimizer [Loshchilov and Hutter, 2017], a batch size of
192 examples, β1 = 0.9, β2 = 0.95, a decoupled weight decay of 0.1 and gradient clipping of 1.0.
We further rely on D-Adaptation based automatic step-sizes [Defazio and Mishchenko, 2023] for the
300M model as it improves model convergence but showed no gain for the bigger models. We use
a cosine learning rate schedule with a warmup of 4000 steps. Additionally, we use an exponential
moving average with a decay of 0.99. We train the 300M, 1.5B and 3.3B parameter models, using
respectively 32, 64 and 96 GPUs, with mixed precision. More specifically, we use float16 as bfloat16
was leading to instabilities in our setup. Finally, for sampling, we employ top-k sampling [Fan et al.,
2018] with keeping the top 250 tokens and a temperature of 1.0
I'm just a gpu jockey and not an ML engineer - Am I chasing ghosts here?
Beta Was this translation helpful? Give feedback.
All reactions