Qwen3-VL-4B-Instruct-heretic-NVFP4

Overview

This repository provides an NVFP4 (FP4 E2M1) mixed-precision quantized build of Qwen3-VL-4B-Instruct-heretic in ComfyUI comfy_quant format, primarily intended for use as a text encoder (e.g. for Krea 2 / Qwen3-VL conditioning).

This model is a decensored derivative of the official Qwen/Qwen3-VL-4B-Instruct, modified using Heretic v1.1.0.

Quantization Details

Backend: ComfyUI convert_to_quant v1.2.6 (comfy_kitchen CUDA NVFP4 kernels)
Format: comfy_quant mixed precision
- NVFP4 (FP4 E2M1, 16-element blocks, learned rounding / SVD optimization) — text transformer blocks 2–33 and all vision 2D linear weights
- FP8 (float8_e4m3fn, tensorwise, learned rounding) — text blocks 1 & 34
- FP16 (bfloat16) — embeddings, model.norm, all norms/biases, pos_embed, patch_embed, and text blocks 0 & 35
Hardware target: NVIDIA Blackwell (SM ≥ 10.0 datacenter, or SM ≥ 12.0 consumer RTX 50 series) — required for the NVFP4 format
Size: 3.44 GB (vs 8.04 GB FP16 source — ~57% smaller)

Layer breakdown

Tier	Layers	Count
FP16 (bf16)	`embed_tokens`, `model.norm`, all norms/biases, `pos_embed`, `patch_embed`, text blocks 0 & 35	kept lossless
FP8 (e4m3fn, tensorwise)	text blocks 1 & 34	14 weights
NVFP4 (E2M1, block=16)	text blocks 2–33 + all vision 2D linears	328 weights

Usage (ComfyUI)

Place qwen3_vl_4b_nvfp4_full.safetensors in ComfyUI/models/text_encoders/.

As Krea 2 text encoder: load with a CLIPLoader node, type krea2.
As generic Qwen3-VL text encoder: load with a CLIPLoader node, type qwen3vl_4b.

ComfyUI auto-detects the quantization metadata (Found quantization metadata version 1) and uses MixedPrecisionOps for the text encoder. The NVFP4 weights are dequantized to bf16 at load time, so input_scale is not required for the text-encoder path; quality is determined by the weight quantization alone.

Quality

Against the FP16 reference (Krea 2 12-layer-tap conditioning output, 6 held-out prompts):

Cosine similarity: ~0.988 mean
Relative L2 error: ~0.155 mean

License

Apache-2.0 (inherited from the base model).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SergiusFlavius/Qwen3-VL-4B-Instruct-heretic-NVFP4

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

SergiusFlavius/Qwen3-VL-4B-Instruct-heretic

Quantized

(4)

this model