Qwen3-Embedding-8B — NVFP4

NVFP4 (W4A4) quantization of Qwen/Qwen3-Embedding-8B, in compressed-tensors format. Runs directly in vLLM.

Calibration was run on multi-turn conversation prompts (concatenated user turns from full conversations) — not query/document pairs.

Downloads last month
46
Safetensors
Model size
5B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amkdg/Qwen3-Embedding-8B-NVFP4

Quantized
(37)
this model