Qwen3.5-4B — FP8 (W8A8) Quantized

FP8 dynamic quantization of Qwen/Qwen3.5-4B using llmcompressor.

Quantization details

Evaluated on HongxinLi/ScreenSpot_v2 (1,272 samples).

Source	BF16	FP8	Delta
android	73.9%	69.2%	-4.7%
ios	78.2%	67.6%	-10.5%
forum	45.6%	31.6%	-13.9%
gitlab	53.4%	46.6%	-6.8%
macos	53.1%	53.7%	+0.7%
windows	47.2%	49.1%	+1.9%
shop	43.3%	39.6%	-3.7%
tool	35.8%	30.8%	-5.0%
Overall	56.5%	51.6%	-4.9%

from vllm import LLM
llm = LLM(
    model="Shashwat42/Qwen3.5-4B-FP8",
    quantization="compressed-tensors",
    dtype="bfloat16",
)

Or serve:

vllm serve Shashwat42/Qwen3.5-4B-FP8 \
  --quantization compressed-tensors \
  --dtype bfloat16

Safetensors

Model size

5B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Quantized

(263)

this model