vLLM Compatibility Issue with Unsloth's 4-bit Quantized Models - Shape Mismatch During Weight Loading
#9
by
varun12345 - opened
vLLM fails to load Unsloth's 4-bit quantized vision-language models due to a shape mismatch between expected quantized weights and loaded unquantized weights during the weight loading process.
Reproduction Steps
Install vLLM 0.10.0 (or 0.10.1.1)
Run: vllm serve "unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit" --port 8001
Error occurs during model loading with the following assertion failure:
It's a little urgent β if anyone has faced this issue before, please let me know
