Standalone Inference Helper

This folder contains a portable inference helper for:

sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700

It is not a full vendored copy of Wan or FastVideo. It contains the sparse FP4 backend overlay and a runner that can be applied to a FastVideo checkout or installation so the uploaded checkpoint can be used for normal inference.

run_inference.py: downloads/loads transformer/diffusion_pytorch_model.safetensors from yitongl/sparse_quant_exp and runs VideoGenerator.
run.sh: convenience wrapper that installs the overlay into FASTVIDEO_ROOT and then runs run_inference.py.
install_overlay.py: copies the bundled sparse FP4 backend files into a FastVideo checkout/install.
overlay_files/: exact runtime source files needed by SPARSE_FP4_OURS_P_ATTN.
training_attention_settings.json: structured settings for the uploaded checkpoint.

Expected Environment

A working FastVideo Python environment.
FastVideo dependencies installed, including PyTorch, Triton, safetensors, and Hugging Face Hub.
Access to the base model Wan-AI/Wan2.1-T2V-1.3B-Diffusers.
A CUDA GPU supported by the custom Triton kernels.

Usage

From a machine with this HF repo downloaded:

export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
  --output-path outputs/sfp4_checkpoint_700 \
  --seed 1000

The script sets:

FASTVIDEO_ATTENTION_BACKEND=SPARSE_FP4_OURS_P_ATTN
FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1

and downloads the uploaded checkpoint-700 transformer weights unless --weights is provided.

To use a local safetensors file:

export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
  --weights /path/to/diffusion_pytorch_model.safetensors \
  --prompt "your prompt"

Attention Semantics

Self-attention uses SPARSE_FP4_OURS_P_ATTN.
Q/K/V use FP4 fake quantization with STE.
VSA tile size is 4 x 4 x 4 = 64 tokens.
Selected sparse tiles use group-local P quantization in the Triton kernel.
Dropped tiles use tile mean compensation.
Cross-attention falls back to dense SDPA and is not sparse/FP4.

Checkpoint

The current HF main transformer file is checkpoint-700:

transformer/diffusion_pytorch_model.safetensors

Local SHA256 used when preparing this helper:

4595ca81ea7085c15ccf14b738aa9c0fdf2d2786641f49b55e0bc0e99bf042d2

yitongl
/

sparse_quant_exp

Standalone Inference Helper

Contents

Expected Environment

Usage

Attention Semantics

Checkpoint