You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Another easy win would be to allow the user to exclude some configurations from the measurement. When quantizeoing at 8bpw its unliekly 2.12 bpw will get any wins so the user could exclude it from the measurement.
I recently started creating my own EXL2 quants, and I encountered this issue as well - only one GPU is used (out four GPUs I have) during the measurement.
I also noticed that it generated a quant layer by layer, so I think potentially not only measurements, but also conversion itself could be performed using all GPUs available.
On my configuration, it would be 4x improvement in speed, which would be a huge performance improvement and would make creating EXL2 quants much easier.
Problem
The measurement stage of conversion takes pretty long.
Solution
Use multiprocessing to spawn n processes to do the measurement on all available gpus.
Alternatives
No response
Explanation
it would help if multiple gpus could be involved in conversion by measuring n tensors at a time across how ever many gpus are available.
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: