Qwen3-VL-4B-Instruct-heretic-NVFP4

Overview

This repository provides an NVFP4 (FP4 E2M1) mixed-precision quantized build of Qwen3-VL-4B-Instruct-heretic in ComfyUI comfy_quant format, primarily intended for use as a text encoder (e.g. for Krea 2 / Qwen3-VL conditioning).

This model is a decensored derivative of the official Qwen/Qwen3-VL-4B-Instruct, modified using Heretic v1.1.0.

Quantization Details

  • Backend: ComfyUI convert_to_quant v1.2.6 (comfy_kitchen CUDA NVFP4 kernels)
  • Format: comfy_quant mixed precision
    • NVFP4 (FP4 E2M1, 16-element blocks, learned rounding / SVD optimization) β€” text transformer blocks 2–33 and all vision 2D linear weights
    • FP8 (float8_e4m3fn, tensorwise, learned rounding) β€” text blocks 1 & 34
    • FP16 (bfloat16) β€” embeddings, model.norm, all norms/biases, pos_embed, patch_embed, and text blocks 0 & 35
  • Hardware target: NVIDIA Blackwell (SM β‰₯ 10.0 datacenter, or SM β‰₯ 12.0 consumer RTX 50 series) β€” required for the NVFP4 format
  • Size: 3.44 GB (vs 8.04 GB FP16 source β€” ~57% smaller)

Layer breakdown

Tier Layers Count
FP16 (bf16) embed_tokens, model.norm, all norms/biases, pos_embed, patch_embed, text blocks 0 & 35 kept lossless
FP8 (e4m3fn, tensorwise) text blocks 1 & 34 14 weights
NVFP4 (E2M1, block=16) text blocks 2–33 + all vision 2D linears 328 weights

Usage (ComfyUI)

Place qwen3_vl_4b_nvfp4_full.safetensors in ComfyUI/models/text_encoders/.

  • As Krea 2 text encoder: load with a CLIPLoader node, type krea2.
  • As generic Qwen3-VL text encoder: load with a CLIPLoader node, type qwen3vl_4b.

ComfyUI auto-detects the quantization metadata (Found quantization metadata version 1) and uses MixedPrecisionOps for the text encoder. The NVFP4 weights are dequantized to bf16 at load time, so input_scale is not required for the text-encoder path; quality is determined by the weight quantization alone.

Quality

Against the FP16 reference (Krea 2 12-layer-tap conditioning output, 6 held-out prompts):

  • Cosine similarity: ~0.988 mean
  • Relative L2 error: ~0.155 mean

License

Apache-2.0 (inherited from the base model).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SergiusFlavius/Qwen3-VL-4B-Instruct-heretic-NVFP4

Quantized
(4)
this model