Instructions to use BitsPlease/HiSQRot4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BitsPlease/HiSQRot4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BitsPlease/HiSQRot4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Wan2.2
How to use BitsPlease/HiSQRot4 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
HiSQRot4
HiSQRot4: SmoothQuant-Rotation HiFloat4 PTQ for W4A4 Text-to-Video Diffusion Models
HiSQRot4 is a post-training quantization project for Wan2.2 text-to-video inference. It keeps the original Wan2.2 denoising path intact and replaces target Linear layers with a W4A4 HiFloat4 inference path.
Table of Contents
- Method Overview
- Environment Setup
- Inference with Released Quantized Weights
- Reproducing the Quantization Pipeline
- VBench Evaluation
- Repository Layout
- Acknowledgements
1. Method Overview
HiSQRot4 uses a three-stage post-training quantization pipeline for Wan2.2 text-to-video inference.
- Stage 1: Calibration collects per-layer activation min/max statistics at group, branch, and channel granularity.
- Stage 2: Quantized artifact preparation builds SmoothQuant channel masks, applies a Hadamard-style rotation matrix to the input channel space, quantizes target weights to HiFloat4, and builds MinMax lookup ranges for runtime activation quantization.
- Stage 3: Inference loads the prepared artifacts and runs Wan2.2 generation with all
Linearlayers in every transformer block replaced by the HiFloat4 W4A4 path.
| Component | Role |
|---|---|
| HiFloat4 | 4-bit floating-point W4A4 inference path |
| MinMax lookup | Offline activation range lookup for Stage 3 activation quantization |
| SmoothQuant | Channel scaling derived from activation min/max and weight magnitudes |
| Hadamard rotation | Input channel rotation with a deterministic Hadamard-style matrix folded into the prepared weight path |
alpha=0.9 |
SmoothQuant alpha used for the released Stage 2 artifacts |
This Hugging Face release is self-contained for alpha=0.9 Stage 3 inference. It includes:
- Wan2.2-T2V-A14B weights in
models/Wan2.2-T2V-A14B/. - Prebuilt alpha=0.9 Stage 2 artifacts in
artifacts/hisqrot4_alpha_0p9/. - The 30-prompt VBench input file
data/prompts/OpenS2V-5M_to_mm_vbench_30.json.
VBench evaluation results:
| Model | Image quality | Aesthetic quality | Overall consistency | Subject consistency | Motion smoothness |
|---|---|---|---|---|---|
| wan2.2 original | 71.53% | 59.03% | 8.45% | 95.40% | 98.92% |
| wan2.2 W4A4 quantized | 73.06% | 58.98% | 8.55% | 96.12% | 98.83% |
2. Environment Setup
The released artifacts were validated with the following runtime:
python 3.10.20
torch 2.10.0+cu128
torchvision 0.25.0+cu128
torchaudio 2.10.0+cu128
triton 3.6.0
flash-attn 2.8.3
Create the conda environment and install the pinned PyTorch stack first:
conda create -n hisqrot4 python=3.10 -y
conda activate hisqrot4
pip install \
torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 \
--index-url https://download.pytorch.org/whl/cu128
Install the remaining runtime dependencies:
pip install -r requirements.txt
requirements.txt includes a prebuilt flash-attn wheel for Linux x86_64,
Python 3.10, and PyTorch 2.10.0+cu128. This avoids multi-hour local source
builds. If you use a different Python, PyTorch, CUDA, or platform combination,
install a matching flash-attn wheel or build it from source.
Build the HiFloat4 CUDA extension from the repository root:
cd hifloat4/hifx4_gpu
bash build.sh
cd ../..
3. Inference with Released Quantized Weights
Use this section if you want to run inference with the quantized weights and artifacts shipped in this repository. You do not need to run Stage 1 or Stage 2.
3.1 Single-Prompt Inference
Run Stage 3 W4A4 inference with the released alpha=0.9 artifacts:
GPU_COUNT=1 \
PROMPT="A cinematic shot of a cat surfing on the sea." \
OUTPUT_FILE="video_output/hifx4/single_prompt_alpha0p9.mp4" \
bash runfiles/05_infer_single_prompt.sh
Replace the PROMPT line with your own text prompt for interactive testing.
3.2 Batch Inference from a Prompt File
Override the defaults when needed:
GPU_COUNT=1 \
PROMPT_FILE="data/prompts/OpenS2V-5M_to_mm_vbench_30.json" \
OUT_FOLDER="video_output/hifx4/OpenS2V-5M_to_mm_vbench_30_alpha0p9" \
bash runfiles/04_infer_hisqrot4_alpha0p9_vbench30.sh
For prompt files with a path field, each generated video uses the basename of
that path as its output filename, for example videos/example.mp4 becomes
example.mp4.
4. Reproducing the Quantization Pipeline
Use this section if you want to rebuild the calibration statistics and quantized artifacts yourself. The released alpha=0.9 quick start in Section 3 does not need these steps.
4.1 Stage 1: Calibration
Stage 1 runs the original model and records activation min/max statistics for target Linear layers.
PROMPT_FILE="data/prompts/OpenS2V-5M_to_mm_calib_30.json" \
ART_ROOT="state_quant/hisqrot4_ptq" \
bash runfiles/01_calibrate_ptq_standard.sh
Expected outputs:
${ART_ROOT}/
low_noise_model/hifx4/calibration.pt
high_noise_model/hifx4/calibration.pt
4.2 Stage 2: Quantized Artifact Preparation
Stage 2 consumes Stage 1 calibration artifacts and creates the prepared HiFloat4 weight path plus runtime MinMax lookup ranges:
ART_ROOT="state_quant/hisqrot4_ptq" \
SMOOTHQUANT_ALPHA=0.9 \
bash runfiles/02_prepare_ptq_standard.sh
Expected outputs:
${ART_ROOT}/
low_noise_model/hifx4/prepared.pt
low_noise_model/hifx4/manifest.json
high_noise_model/hifx4/prepared.pt
high_noise_model/hifx4/manifest.json
Set ROTATION_PATH only if you want to override the internal deterministic Hadamard-style rotation with a local rotation checkpoint.
4.3 Stage 3: Inference with Rebuilt Artifacts
Point ART_ROOT at your rebuilt artifact root:
ART_ROOT="state_quant/hisqrot4_ptq" \
PROMPT="A cinematic shot of a cat surfing on the sea." \
OUTPUT_FILE="video_output/hifx4/custom_stage123_sample.mp4" \
bash runfiles/03_infer_ptq_standard.sh
For batch inference with rebuilt artifacts:
ART_ROOT="state_quant/hisqrot4_ptq" \
PROMPT_FILE="data/prompts/OpenS2V-5M_to_mm_vbench_30.json" \
OUT_FOLDER="video_output/hifx4/custom_stage123_vbench30" \
bash runfiles/03_batch_custom_prompt_file_infer.sh
5. VBench Evaluation
Install evaluation dependencies:
pip install -r requirements-vbench.txt
pip install --no-build-isolation \
"detectron2 @ git+https://github.com/facebookresearch/detectron2.git@8a9d885b3d4dcf1bef015f0593b872ed8d32b4ab"
After batch generation, evaluate the output directory:
VIDEOS_INPUT_DIR="video_output/hifx4/OpenS2V-5M_to_mm_vbench_30_alpha0p9" \
RUN_TAG="vbench_hisqrot4_alpha0p9_vbench30" \
bash runfiles/03_eval_vbench_video_dir_custom5.sh
Evaluation results are written to:
eval_output/vbench_hisqrot4_alpha0p9_vbench30/
Set EXPECTED_VIDEO_CNT only when you want the evaluation script to validate an exact number of generated videos.
6. Repository Layout
HiSQRot4/
generate.py
hifx4_linear_quant.py
hifx4_ptq_backend.py
hifloat4/
wan2.2/
runfiles/
data/prompts/OpenS2V-5M_to_mm_vbench_30.json
models/Wan2.2-T2V-A14B/
artifacts/hisqrot4_alpha_0p9/
requirements.txt
Large model and artifact files are intended to be uploaded with Git LFS. This repository includes .gitattributes entries for *.safetensors, *.pt, *.pth, *.bin, and *.onnx.
7. Acknowledgements
HiSQRot4 builds on:
Please cite the upstream Wan2.2 and relevant quantization work when using this project in research.
- Downloads last month
- -