Qwen3-VL-4B-Instruct-heretic-NVFP4
Overview
This repository provides an NVFP4 (FP4 E2M1) mixed-precision quantized build of
Qwen3-VL-4B-Instruct-heretic
in ComfyUI comfy_quant format,
primarily intended for use as a text encoder (e.g. for Krea 2 / Qwen3-VL conditioning).
This model is a decensored derivative of the official Qwen/Qwen3-VL-4B-Instruct, modified using Heretic v1.1.0.
Quantization Details
- Backend: ComfyUI
convert_to_quantv1.2.6 (comfy_kitchenCUDA NVFP4 kernels) - Format:
comfy_quantmixed precision- NVFP4 (FP4 E2M1, 16-element blocks, learned rounding / SVD optimization) β text transformer blocks 2β33 and all vision 2D linear weights
- FP8 (
float8_e4m3fn, tensorwise, learned rounding) β text blocks 1 & 34 - FP16 (
bfloat16) β embeddings,model.norm, all norms/biases,pos_embed,patch_embed, and text blocks 0 & 35
- Hardware target: NVIDIA Blackwell (SM β₯ 10.0 datacenter, or SM β₯ 12.0 consumer RTX 50 series) β required for the NVFP4 format
- Size: 3.44 GB (vs 8.04 GB FP16 source β ~57% smaller)
Layer breakdown
| Tier | Layers | Count |
|---|---|---|
| FP16 (bf16) | embed_tokens, model.norm, all norms/biases, pos_embed, patch_embed, text blocks 0 & 35 |
kept lossless |
| FP8 (e4m3fn, tensorwise) | text blocks 1 & 34 | 14 weights |
| NVFP4 (E2M1, block=16) | text blocks 2β33 + all vision 2D linears | 328 weights |
Usage (ComfyUI)
Place qwen3_vl_4b_nvfp4_full.safetensors in ComfyUI/models/text_encoders/.
- As Krea 2 text encoder: load with a
CLIPLoadernode, typekrea2. - As generic Qwen3-VL text encoder: load with a
CLIPLoadernode, typeqwen3vl_4b.
ComfyUI auto-detects the quantization metadata (Found quantization metadata version 1)
and uses MixedPrecisionOps for the text encoder. The NVFP4 weights are dequantized
to bf16 at load time, so input_scale is not required for the text-encoder path;
quality is determined by the weight quantization alone.
Quality
Against the FP16 reference (Krea 2 12-layer-tap conditioning output, 6 held-out prompts):
- Cosine similarity: ~0.988 mean
- Relative L2 error: ~0.155 mean
License
Apache-2.0 (inherited from the base model).
Model tree for SergiusFlavius/Qwen3-VL-4B-Instruct-heretic-NVFP4
Base model
Qwen/Qwen3-VL-4B-Instruct