Wan2.2-I2V-A14B-Moe-Distill-Lightx2v-NVFP4

Overview

This is a partial NVFP4 quantization of Wan2.2-I2V-A14B-Moe-Distill-Lightx2v by lightx2v, produced using convert_to_quant by silveroxides.

Wan2.2-I2V-A14B-Moe-Distill-Lightx2v is an image-to-video generation model built on Wan2.2-I2V-A14B. It applies step distillation and classifier-free guidance distillation to reduce inference to 4 steps without CFG, cutting generation time substantially while preserving output quality.

IMPORTANT

Since NVFP4 is only supported on NVIDIA Blackwell architecture GPUs, running this model requires a Blackwell GPU with its corresponding support enabled in torch, along with a recent version of ComfyUI and comfy-kitchen built against CUDA 13.

Quantization

The model weights have been partially quantized to NVFP4 (NVIDIA Floating Point 4-bit) and MXFP8, quantization formats supported on NVIDIA Blackwell architecture GPUs.

The quantization format assigned to each layer is based on a sensitivity analysis performed with a custom script, which scores each weight tensor using excess kurtosis, dynamic range, and aspect ratio. Thresholds are derived automatically from the model's own score distribution.

The analysis yields the following convert_to_quant parameters. This conversion takes about 4 hours on an RTX 5060 for each model (high and low noise).

#!/bin/bash
convert_to_quant -i "${1}" \
    --nvfp4 --wan --comfy_quant --save-quant-metadata \
    --custom-type mxfp8 \
    --custom-layers "blocks\.(1|2|3)\.cross_attn\.k\.weight|blocks\.(6|8|9|10)\.cross_attn\.k\.weight|blocks\.(0|1|2|3)\.cross_attn\.v\.weight|blocks\.(6)\.cross_attn\.q\.weight|blocks\.(6|14)\.cross_attn\.o\.weight|blocks\.(0|1|2|3)\.cross_attn\.v_img\.weight|blocks\.(0)\.self_attn\.k\.weight|blocks\.(7|9|10|12|13|14)\.self_attn\.k\.weight|blocks\.(19)\.self_attn\.q\.weight|blocks\.(0|1|2|3)\.ffn\.0\.weight|blocks\.(36|37|38|39)\.ffn\.0\.weight|blocks\.(39)\.cross_attn\.(k|v|k_img|v_img)\.weight|blocks\.(39)\.self_attn\.(k|v)\.weight" \
    --exclude-layers "blocks\.(4|5|7)\.cross_attn\.k\.weight|blocks\.(0)\.cross_attn\.q\.weight|blocks\.(5|7|9|10|11|12|19|20)\.cross_attn\.o\.weight|blocks\.(8|11|33)\.self_attn\.k\.weight|blocks\.(38)\.self_attn\.k\.weight|blocks\.(14|16|17)\.self_attn\.q\.weight|blocks\.(39)\.cross_attn\.(q|o)\.weight|blocks\.(39)\.self_attn\.(q|o)\.weight|blocks\.(39)\.ffn\.2\.weight" \
    --num-iter 6000 \
    --top-p 0.35 \
    --calib-samples 8192 \
    --extract-lora --lora-rank 64 \
    --lora-target "ffn\.(0|2)\.weight|self_attn\.(v|o)\.weight" \
    -o "${1%%.safetensors}-nvfp4.safetensors" \
    --lora-output "${1%%.safetensors}-lora.safetensors"

Two rank-64 LoRAs were also generated that can be used to minimise the effects of the resulting quantization for both high and low models.

Inference

The model can be used in ComfyUI with the following parameters, based on the distilled model's own recommendations:

Parameter Value
Shift 5.0
Sampler Euler
Scheduler Simple
CFG 1.0
Steps 4

The combinations euler/simple and heun/linear_quadratic (sampler/scheduler) are also known to produce good results.

The model is designed to generate 81 frames and is compatible with LoRAs. Sampling completes in under 60 seconds on an RTX 5060, making it possible to produce a full 81-frame video in under two minutes; with RIFE, those 81 frames convert to a 10-second video.

Abrupt camera movements or fast subject motion may produce artifacts. This is an inherent limitation of applying aggressive quantization to an already distilled model.

License Agreement

This model is licensed under the Apache 2.0 License. You retain full ownership of your generated content, but are solely responsible for its use in compliance with the license terms and applicable laws.

Acknowledgements

Big kudos to the contributors to the Wan2.2 and Self-Forcing repositories for their open research, and to silveroxides for their quantization tools.

Downloads last month
6,487
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for InsecureErasure/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v-NVFP4

Quantized
(1)
this model