Release GPTQModel v2.0.0 · ModelCloud/GPTQModel

What's Changed

🎉 GPTQ quantization internals are now broken into multiple stages (processes) for feature expansion.
🎉 Synced Marlin kernel inference quality fix from upstream. Added MARLIN_FP16, lower-quality but faster backend.
🎉 ModelScope support added.
🎉 Logging and cli progress bar output has been revamped with sticky bottom progress.
🎉 Added CI tests to track regression in kernel inference quality and sweep all bits/group_sizes.
🎉 Delegate loggin/progressbar to LogBar pkg.
🐛 Fix ROCm version auto detection in setup install.
🐛 Fixed generation_config.json save and load.
🐛 Fixed Transformers v4.49.0 compat. Fixed compat of models without bos.
🐛 Fixed group_size=-1 and bits=3 packing regression.
🐛 Fixed Qwen 2.5 MoE regressions.

fix 3 bit packing regression， fixed #1278 by @CSY-ModelCloud in #1280
Fix supported models list (syntax error) by @Forenche in #1281
feat: load model from modelscope by @suluyana in #1283
merge eval & utils.lm_eval by @CSY-ModelCloud in #1282
fix modelscope import & tests by @CSY-ModelCloud in #1285
allow passing model instance to evalplus & update tokenizer loading logics by @CSY-ModelCloud in #1284
fix lm-eval & vllm check tokenizer type by @CSY-ModelCloud in #1287
Fix generation_config.json not auto-saved by @Qubitium in #1292
[SAVE] Save config files with empty state dict by @ZX-ModelCloud in #1293
[SAVE] Save processor related config files by @ZX-ModelCloud in #1295
fix wrong order of config save causing sharded tensors to be removed by @Qubitium in #1297
[FIX] not pack when group_size=-1 by @ZX-ModelCloud in #1298
cleanup marlin paths: marlin does conversion on post_init by @Qubitium in #1310
bump tokenicer to v0.0.3 by @CSY-ModelCloud in #1308
clean is_marlin_format for tests by @CSY-ModelCloud in #1311
[CI] fix sglang test name & add status logs & remove exllama packing test by @CSY-ModelCloud in #1312
skip v1 to v2 conversion for sym=True only kernels by @Qubitium in #1314
bump tokenicer to 0.0.4 & remove FORMAT_FIELD_COMPAT_MARLIN by @CSY-ModelCloud in #1315
revert is_marlin_format check by @CSY-ModelCloud in #1316
Improve Marlin accuracy (default) but add MARLIN_FP16 backend for faster with less-accuracy by @Qubitium in #1317
marlin fp32 mode should also be enabled if kernel was selected due to… by @Qubitium in #1318
refractor logger by @Qubitium in #1319
fix typo by @Qubitium in #1320
refractor logger and have progress bar sticky to bottom of cli by @Qubitium in #1322
[CI] fix tokenicer upgraded transformers & install bitblas for test_save_quanted_model by @CSY-ModelCloud in #1321
[CI] allow to select compiler server & move model test to correct dir by @CSY-ModelCloud in #1323
fix bitblas loading regression by @Qubitium in #1324
marlin fp16 warning missed check by @Qubitium in #1325
fix custom logger overriding system level logger by @Qubitium in #1327
fix progress bar for packing by @CSY-ModelCloud in #1326
More log fixes by @Qubitium in #1328
fix no backend when creating a quant linear by @CSY-ModelCloud in #1329
use relative path instead of importing gptqmodel by @CSY-ModelCloud in #1331
no need patch vllm now by @CSY-ModelCloud in #1332
[CI] fix CI url by @CSY-ModelCloud in #1333
fix oom by @CSY-ModelCloud in #1335
add default value for backend, fix optimum doesn't pass it by @CSY-ModelCloud in #1334
refractor pb and pb usage by @Qubitium in #1341
fix generator has no length info by @CSY-ModelCloud in #1342
replace utils.Progressbar with logbar by @CSY-ModelCloud in #1343
[CI] update UI by @CSY-ModelCloud in #1344
fix logbar api usage by @CSY-ModelCloud in #1345
fix v2 to v1 missed logic bypass by @Qubitium in #1347
[CI] fix xpu env has no logbar by @CSY-ModelCloud in #1346
[CI] update runner ip env & fix show-statistics didn't run by @CSY-ModelCloud in #1348
fix time was not imported by @CSY-ModelCloud in #1349
update device-smi depend to v0.4.0 by @Qubitium in #1351
[CI] install requirements.txt for m4 by @CSY-ModelCloud in #1352
Exllama V1 is Packable by @ZX-ModelCloud in #1356
[FIX] test_packable.py by @ZX-ModelCloud in #1357
[setup] use torch.version.hip for rocm version check by @CSY-ModelCloud in #1360
save/load peft lora by @Qubitium in #1358
update device-smi to 0.4.1 for rocm fix by @Qubitium in #1362
strip model path by @Qubitium in #1363
[CI] exllama v1 kernel now eligible for quant stage by @Qubitium in #1364
Fix transformers modeling code passing input.shape[0] == 0 to nn.module by @Qubitium in #1365
simplify log var by @Qubitium in #1368
fix import by @CSY-ModelCloud in #1369
update by @Qubitium in #1370

New Contributors

@Forenche made their first contribution in #1281
@suluyana made their first contribution in #1283

Full Changelog: v1.9.0...v2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v2.0.0

What's Changed

New Contributors

Contributors