completing the "bitsandbytes" option - based on https://docs.vllm.ai/en/stable/quantization/bnb.html #147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
in https://docs.vllm.ai/en/stable/quantization/bnb.html there are 3 things to do
bitsandbytes>=0.45.0
quantization="bitsandbytes"
load_format="bitsandbytes"
Current state for
runpod-workers/worker-vllm:main
the
load_format="bitsandbytes"
is availablemohamednaji7/worker-vllm:main
is ahead forbitsandbytes>=0.45.0
to therequirements.txt
args.load_format
] forcedargs.quantization
since it is the only quantization forload_format="bitsandbytes"
QUANTIZATION
row inRAEDME.md
QUANTIZATION
option inworker-config.json
typing-extensions>=4.8.0
since there is a comapitablility issue forbitsandbytes>=0.45.0
andtyping-extensions==4.7.1
.