Skip to content

GPTQModel v2.0.0

Latest
Compare
Choose a tag to compare
@Qubitium Qubitium released this 03 Mar 22:14
· 2 commits to main since this release
c0f9dc0

What's Changed

🎉 GPTQ quantization internals are now broken into multiple stages (processes) for feature expansion.
🎉 Synced Marlin kernel inference quality fix from upstream. Added MARLIN_FP16, lower-quality but faster backend.
🎉 ModelScope support added.
🎉 Logging and cli progress bar output has been revamped with sticky bottom progress.
🎉 Added CI tests to track regression in kernel inference quality and sweep all bits/group_sizes.
🎉 Delegate loggin/progressbar to LogBar pkg.
🐛 Fix ROCm version auto detection in setup install.
🐛 Fixed generation_config.json save and load.
🐛 Fixed Transformers v4.49.0 compat. Fixed compat of models without bos.
🐛 Fixed group_size=-1 and bits=3 packing regression.
🐛 Fixed Qwen 2.5 MoE regressions.

New Contributors

Full Changelog: v1.9.0...v2.0.0