vLLM Compatibility Issue with Unsloth's 4-bit Quantized Models - Shape Mismatch During Weight Loading

#9
by varun12345 - opened

vLLM fails to load Unsloth's 4-bit quantized vision-language models due to a shape mismatch between expected quantized weights and loaded unquantized weights during the weight loading process.

Reproduction Steps
Install vLLM 0.10.0 (or 0.10.1.1)
Run: vllm serve "unsloth/Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit" --port 8001
Error occurs during model loading with the following assertion failure:

image (1).png

It's a little urgent β€” if anyone has faced this issue before, please let me know

Sign up or log in to comment