Cannot see effectiveness of Zero 3. Any help is much appreciated. #1680
Replies: 2 comments 3 replies
-
@hpourmodheji, what tool are using to generate this memory usage? Can you please profile memory usage in your code using see_memory_usage(), similar to here? Can you share the logs from that? |
Beta Was this translation helpful? Give feedback.
-
@hpourmodheji, thanks for sharing this table. There are two things to consider.
So, it is more precise to say that ZeRO3 linearly reduces memory consumption of model and optimizer. |
Beta Was this translation helpful? Give feedback.
-
Hello Community, DeepSpeed Team and @tjruwase,
I tried to train large bing BERT (w/ ~300M parameters) on 8 GPUs (12GB GeForce GTX 1080 Ti) to see memory reduction with ZeRO 3. It shows linear memory reduction (8x) on the model state (from ~4400MB to ~550MB); however, the max of the memory consumption does not have the same decrease during the training. So what am I maybe missing here that I do not see the effectiveness of ZeRO 3? Any help from you is very much appreciated, as understanding this issue is a road blocker for my research.
Here are the images:
Baseline - GPU Memory Consumption during Training with single GPU:
ZeRO 3 - GPU Memory Consumption during Training on 8 GPUs with ZeRO 3 enabled:
Best,
Hossein
Beta Was this translation helpful? Give feedback.
All reactions