Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mohamednaji7
Copy link

in https://docs.vllm.ai/en/stable/quantization/bnb.html there are 3 things to do

  1. Installting bitsandbytes>=0.45.0
  2. quantization="bitsandbytes"
  3. load_format="bitsandbytes"

Current state for runpod-workers/worker-vllm:main

the load_format="bitsandbytes" is available

mohamednaji7/worker-vllm:main is ahead for

  1. added bitsandbytes>=0.45.0 to the requirements.txt
  2. [after checking args.load_format] forced args.quantization since it is the only quantization for load_format="bitsandbytes"
  3. updated the QUANTIZATION row in RAEDME.md
  4. updated QUANTIZATION option in worker-config.json
  5. update to typing-extensions>=4.8.0 since there is a comapitablility issue for bitsandbytes>=0.45.0 and typing-extensions==4.7.1.

mohamednaji7 and others added 6 commits January 21, 2025 13:49
```
2025-01-22 18:04:01 [INFO] > [stage-0 5/8] RUN --mount=type=cache,target=/root/.cache/pip python3 -m pip install --upgrade pip && python3 -m pip install --upgrade -r /requirements.txt:
2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904
2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 The conflict is caused by:
2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 The user requested typing-extensions==4.7.1
2025-01-22 18:04:01 [INFO] runpod-workers#13 8.904 bitsandbytes 0.45.0 depends on typing_extensions>=4.8.0
```
@pandyamarut
Copy link
Collaborator

How did you test this ? Can you please share @mohamednaji7

@mohamednaji7
Copy link
Author

I deployed RunPod server-less endpoint using the repo and setting the environment variables and it works.
I tested and got the errors using the same deployment

device > GPU
I made a test now with the current worker-vllm
image

using None

image

image
logs - worker-vllm.txt

2025-02-01T09:19:14.726126817Z {"requestId": null, "message": "Uncaught exception | <class 'ValueError'>; BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None; <traceback object at 0x700e46dea980>;", "level": "ERROR"}

deployment using this repo and setting the env vars
MODEL_NAME : unsloth/tinyllama-bnb-4bit
LOAD_FORMAT : bitsandbytes
QUANTIZATION : bitsandbytes

RESULT >>
image

Logs >>
logs - bitsandbytes.txt

THE BUILD was done before
image

image
build-logs-worker-vllm -fb.txt

@mohamednaji7
Copy link
Author

@pandyamarut I wrote my tests above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants