-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [CPU Memory OOM] DeekSpeek R1 got os oom-kill when packing model.layers #1355
Comments
@ShiningMaker We need the following:
For DeepSeek V3/R1 BF16, you should have 1.5TB of CPU memory to avoid OOM. |
@ShiningMaker I noticed the OOM process had 2TB of memory which includes mmap/disk memory. What is the max cpu memory in your vm or computer instance? 2TB should be more than enough even for DeepSeek R1.
|
I restarted GPTQ to perform int8 quantization. While quantizing the 11/60 layers, I checked the memory information using free -h. My understanding is that 2TB memory should be sufficient for R1's memory requirements. However, during the packing process, there may also be an int8 model loaded on the CPU. This issue occurred while I was packing the 8/60 layers.
|
Are you saying while Also, I see that there are 400GB of buffer cache (disk cache). You can free those by calling |
No, no, no, what I mean is that the int8 model generated during the packing process coexists with the original bf16 model (is this situation possible?). And for release memory, will explicitly calling gc.collect() in the packing code of gptqmodel work? |
@ShiningMaker You can try and test calling |
@Qubitium I also encountered the same problem. I believe that after packing a layer, deleting the corresponding FP32 fake quantized weights and releasing CPU memory could help when packing large models like DeepSeek-V3. However, I'm not sure how to achieve this.
|
Describe the bug
From my dmesg output, it is evident that the GPTQ Python process (PID 1179327) was killed by the kernel due to the system running out of memory (Out of Memory, OOM).
GPU Info
NVIDIA H20
Software Info
Show output of:
Model
DeepSeek-R1-BF16 from huggingface
To Reproduce
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
When I asked GPT-4o, its reply was as follows:
Possible reasons:
Memory leak: Your Python program might have a memory leak, causing it to continuously consume memory during processing without releasing some objects that are no longer needed.
Handling large amounts of data: The program may be loading or processing a large amount of data, and if the memory requirements exceed the available memory of the system, it will trigger an OOM (Out of Memory) error.
Concurrent operations: If you are dealing with multiple processes or threads, it could lead to increased memory requirements.
Summary
OOM (Out of Memory) issues that lead to process termination due to insufficient memory are common problems, especially in tasks involving large datasets. It is recommended that you start by checking your code and optimizing memory usage, looking for potential memory leaks or improving the way memory is utilized.
The text was updated successfully, but these errors were encountered: