completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

mohamednaji7 · 2025-01-22T19:13:31Z

in https://docs.vllm.ai/en/stable/quantization/bnb.html there are 3 things to do

Installting bitsandbytes>=0.45.0
quantization="bitsandbytes"
load_format="bitsandbytes"

Current state for `runpod-workers/worker-vllm:main`

the load_format="bitsandbytes" is available

`mohamednaji7/worker-vllm:main` is ahead for

added bitsandbytes>=0.45.0 to the requirements.txt
[after checking args.load_format] forced args.quantization since it is the only quantization for load_format="bitsandbytes"
updated the QUANTIZATION row in RAEDME.md
updated QUANTIZATION option in worker-config.json
update to typing-extensions>=4.8.0 since there is a comapitablility issue for bitsandbytes>=0.45.0 and typing-extensions==4.7.1.

``` 2025-01-22 18:04:01 [INFO] > [stage-0 5/8] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip && python3 -m pip install --upgrade -r /requirements.txt: 2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 The conflict is caused by: 2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 The user requested typing-extensions==4.7.1 2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 bitsandbytes 0.45.0 depends on typing_extensions>=4.8.0 ```

pandyamarut · 2025-01-28T00:43:20Z

How did you test this ? Can you please share @mohamednaji7

mohamednaji7 · 2025-02-01T09:45:28Z

I deployed RunPod server-less endpoint using the repo and setting the environment variables and it works.
I tested and got the errors using the same deployment

device > GPU
I made a test now with the current worker-vllm

using None

logs - worker-vllm.txt

2025-02-01T09:19:14.726126817Z {"requestId": null, "message": "Uncaught exception | <class 'ValueError'>; BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None; <traceback object at 0x700e46dea980>;", "level": "ERROR"}

deployment using this repo and setting the env vars
MODEL_NAME : unsloth/tinyllama-bnb-4bit
LOAD_FORMAT : bitsandbytes
QUANTIZATION : bitsandbytes

RESULT >>

Logs >>
logs - bitsandbytes.txt

THE BUILD was done before

build-logs-worker-vllm -fb.txt

mohamednaji7 · 2025-02-01T09:48:41Z

@pandyamarut I wrote my tests above

… the QUNATIZATION row

mohamednaji7 and others added 6 commits January 21, 2025 13:49

including bitsandbytes "src:https://docs.vllm.ai/en/stable/quantizati…

7167985

…on/bnb.html"

inforce args.quantization for bnb load_froamt

a27f72a

adding 'bitsandbytes' option

331bc30

adding 'bitsandbytes' to QUANTIZATION

8882f6d

correct access to "args" dictionary

131c175

updating the . in ['awq', 'squeezellm', 'gptq'. 'bitsandbytes'] for…

0428824

… the QUNATIZATION row

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

mohamednaji7 commented Jan 22, 2025

pandyamarut commented Jan 28, 2025

mohamednaji7 commented Feb 1, 2025

mohamednaji7 commented Feb 1, 2025

completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

Are you sure you want to change the base?

completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

Conversation

mohamednaji7 commented Jan 22, 2025

Current state for runpod-workers/worker-vllm:main

mohamednaji7/worker-vllm:main is ahead for

pandyamarut commented Jan 28, 2025

mohamednaji7 commented Feb 1, 2025

I deployed RunPod server-less endpoint using the repo and setting the environment variables and it works. I tested and got the errors using the same deployment

mohamednaji7 commented Feb 1, 2025

Current state for `runpod-workers/worker-vllm:main`

`mohamednaji7/worker-vllm:main` is ahead for

I deployed RunPod server-less endpoint using the repo and setting the environment variables and it works.
I tested and got the errors using the same deployment