Qwen3-VL-Reranker-2B-NVFP4

NVFP4 W4A16 quantized version of Qwen3-VL-Reranker-2B using NVIDIA ModelOpt.

Quantization Details

Item	Value
Base model	Qwen/Qwen3-VL-Reranker-2B
Quantization tool	NVIDIA ModelOpt v0.44.0
Quantization format	W4A16 NVFP4 — weights in FP4, activations in BF16
Model size	4.0 GB (bf16) → 2.1 GB
Weight block size	16
Skipped layers	`lm_head`, `model.visual*` (vision encoder)

Hardware Requirements

Supports NVIDIA Ampere and later GPUs via the Marlin FP4 kernel. Blackwell GPUs provide additional performance benefits.

Usage (vLLM)

Start Reranker Server

vllm serve jeffpeng3/Qwen3-VL-Reranker-2B-NVFP4 \
  --runner pooling \
  --hf_overrides '{"architectures": ["Qwen3VLForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}' \
  --quantization modelopt \
  --dtype bfloat16 \
  --max-model-len 1024

Score API Example

from vllm import LLM

llm = LLM(
    model="jeffpeng3/Qwen3-VL-Reranker-2B-NVFP4",
    runner="pooling",
    dtype="bfloat16",
    max_model_len=1024,
    hf_overrides={
        "architectures": ["Qwen3VLForSequenceClassification"],
        "classifier_from_token": ["no", "yes"],
        "is_original_qwen3_reranker": True,
    },
)

query = "A woman playing with her dog on a beach at sunset."
documents = [
    {"text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach."},
    {"text": "Mars is known as the Red Planet."},
]

for doc in documents:
    outputs = llm.score(
        query,
        {"content": [{"type": "text", "text": doc["text"]}]},
        chat_template="additional_chat_templates/reranker.jinja",
    )
    print(f"Score: {outputs[0].outputs.score}")

See the base model card for detailed usage and benchmarks.

Citation

@article{qwen3vlembedding,
  title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
  author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
  journal={arXiv preprint arXiv:2601.04720},
  year={2026}
}