Qwen3-Embedding-8B — NVFP4
NVFP4 (W4A4) quantization of Qwen/Qwen3-Embedding-8B,
in compressed-tensors format. Runs directly in vLLM.
Calibration was run on multi-turn conversation prompts (concatenated user turns from full conversations) — not query/document pairs.
- Downloads last month
- 46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support