Replies: 1 comment 1 reply
-
Hi @hezq06, thanks for your question this is interesting. I have never tried this, but I believe you should be able to do this mostly outside of DeepSpeed between your dataloader and making sure you get the 10:9:5 split you desire for a One thing to note, I would make sure to align torch versions across your cluster and ideally cuda versions as well. |
Beta Was this translation helpful? Give feedback.
-
Our lab has 3 servers (bought at different years) with a speed ratio 10:9:5. To maintain load balance, I'm thinking of tweaking the batch size also to be 10:9:5. This seems to be a simple idea (being able to set batch size for each node separately and use that number as a weight during the parameter update stage). However, I cannot find an easy solution for implementing that with DeepSpeed. Does anyone know a way?
Beta Was this translation helpful? Give feedback.
All reactions