What's Changed
🎉 GPTQ quantization internals are now broken into multiple stages (processes) for feature expansion.
🎉 Synced Marlin kernel inference quality fix from upstream. Added MARLIN_FP16, lower-quality but faster backend.
🎉 ModelScope support added.
🎉 Logging and cli progress bar output has been revamped with sticky bottom progress.
🎉 Added CI tests to track regression in kernel inference quality and sweep all bits/group_sizes.
🎉 Delegate loggin/progressbar to LogBar pkg.
🐛 Fix ROCm version auto detection in setup install.
🐛 Fixed generation_config.json save and load.
🐛 Fixed Transformers v4.49.0 compat. Fixed compat of models without bos.
🐛 Fixed group_size=-1 and bits=3 packing regression.
🐛 Fixed Qwen 2.5 MoE regressions.
- fix 3 bit packing regression, fixed #1278 by @CSY-ModelCloud in #1280
- Fix supported models list (syntax error) by @Forenche in #1281
- feat: load model from modelscope by @suluyana in #1283
- merge eval & utils.lm_eval by @CSY-ModelCloud in #1282
- fix modelscope import & tests by @CSY-ModelCloud in #1285
- allow passing model instance to evalplus & update tokenizer loading logics by @CSY-ModelCloud in #1284
- fix lm-eval & vllm check tokenizer type by @CSY-ModelCloud in #1287
- Fix
generation_config.json
not auto-saved by @Qubitium in #1292 - [SAVE] Save config files with empty state dict by @ZX-ModelCloud in #1293
- [SAVE] Save processor related config files by @ZX-ModelCloud in #1295
- fix wrong order of config save causing sharded tensors to be removed by @Qubitium in #1297
- [FIX] not pack when group_size=-1 by @ZX-ModelCloud in #1298
- cleanup marlin paths: marlin does conversion on
post_init
by @Qubitium in #1310 - bump tokenicer to v0.0.3 by @CSY-ModelCloud in #1308
- clean is_marlin_format for tests by @CSY-ModelCloud in #1311
- [CI] fix sglang test name & add status logs & remove exllama packing test by @CSY-ModelCloud in #1312
- skip v1 to v2 conversion for sym=True only kernels by @Qubitium in #1314
- bump tokenicer to 0.0.4 & remove FORMAT_FIELD_COMPAT_MARLIN by @CSY-ModelCloud in #1315
- revert is_marlin_format check by @CSY-ModelCloud in #1316
- Improve Marlin accuracy (default) but add
MARLIN_FP16
backend for faster with less-accuracy by @Qubitium in #1317 - marlin fp32 mode should also be enabled if kernel was selected due to… by @Qubitium in #1318
- refractor logger by @Qubitium in #1319
- fix typo by @Qubitium in #1320
- refractor logger and have progress bar sticky to bottom of cli by @Qubitium in #1322
- [CI] fix tokenicer upgraded transformers & install bitblas for test_save_quanted_model by @CSY-ModelCloud in #1321
- [CI] allow to select compiler server & move model test to correct dir by @CSY-ModelCloud in #1323
- fix bitblas loading regression by @Qubitium in #1324
- marlin fp16 warning missed check by @Qubitium in #1325
- fix custom logger overriding system level logger by @Qubitium in #1327
- fix progress bar for packing by @CSY-ModelCloud in #1326
- More log fixes by @Qubitium in #1328
- fix no backend when creating a quant linear by @CSY-ModelCloud in #1329
- use relative path instead of importing gptqmodel by @CSY-ModelCloud in #1331
- no need patch vllm now by @CSY-ModelCloud in #1332
- [CI] fix CI url by @CSY-ModelCloud in #1333
- fix oom by @CSY-ModelCloud in #1335
- add default value for backend, fix optimum doesn't pass it by @CSY-ModelCloud in #1334
- refractor pb and pb usage by @Qubitium in #1341
- fix generator has no length info by @CSY-ModelCloud in #1342
- replace utils.Progressbar with logbar by @CSY-ModelCloud in #1343
- [CI] update UI by @CSY-ModelCloud in #1344
- fix logbar api usage by @CSY-ModelCloud in #1345
- fix v2 to v1 missed logic bypass by @Qubitium in #1347
- [CI] fix xpu env has no logbar by @CSY-ModelCloud in #1346
- [CI] update runner ip env & fix show-statistics didn't run by @CSY-ModelCloud in #1348
- fix time was not imported by @CSY-ModelCloud in #1349
- update device-smi depend to v0.4.0 by @Qubitium in #1351
- [CI] install requirements.txt for m4 by @CSY-ModelCloud in #1352
- Exllama V1 is Packable by @ZX-ModelCloud in #1356
- [FIX] test_packable.py by @ZX-ModelCloud in #1357
- [setup] use torch.version.hip for rocm version check by @CSY-ModelCloud in #1360
- save/load peft lora by @Qubitium in #1358
- update device-smi to 0.4.1 for rocm fix by @Qubitium in #1362
- strip model path by @Qubitium in #1363
- [CI] exllama v1 kernel now eligible for quant stage by @Qubitium in #1364
- Fix transformers modeling code passing
input.shape[0] == 0
to nn.module by @Qubitium in #1365 - simplify log var by @Qubitium in #1368
- fix import by @CSY-ModelCloud in #1369
- update by @Qubitium in #1370
New Contributors
Full Changelog: v1.9.0...v2.0.0