PhyRAG Wan2.2 TI2V-5B
Authors: Kexu Cheng, Zicheng Liu, Mingju Gao, Chunhe Song, Hao Tang
Project page: https://sediment1024.github.io/PhysRAG/ Paper: PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation (coming soon)
Code: https://github.com/sediment1024/PhysRAG
Dataset: https://huggingface.co/datasets/sediment1024/PhysRAG
This repository contains the physical-injection checkpoint used by PhyRAG, built on top of Wan2.2 TI2V-5B. The base Wan2.2 checkpoint is not included.
Configuration
- 49 frames at 704 x 480 (width x height)
- Physical injection at DiT blocks 0, 1, and 2
- 128 learnable queries
- Adapter dimension 16
- VideoCLIP-XL retrieval over a 170-video physical reference library
- VideoMAE-V2 features cached offline
Checkpoint loading
merged_model.pt is the rank-0 sparse state dict produced by the original
DeepSpeed ZeRO-3 training run. Empty partition tensors are intentionally skipped
by the PhyRAG checkpoint loader. Use the loader included in the companion code
repository; loading this file directly with strict load_state_dict is not
supported.
The SHA-256 checksum is
ae60ae88911560b48b1172e3302586b07a6da1f70fcea32229cacddcb702321d.
Required assets
- Wan2.2 TI2V-5B base model
- PhyRAG 170-video RAG library and FAISS index
- VideoCLIP-XL retriever checkpoint
The paper link will be added once the public manuscript page is available.
Citation
@misc{cheng2026physragenhancingphysicsawarenessvideo,
title={PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation},
author={Kexu Cheng and Zicheng Liu and Mingju Gao and Chunhe Song and Hao Tang},
year={2026},
eprint={2606.26916},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.26916},
}
Model tree for sediment1024/PhysRAG
Base model
Wan-AI/Wan2.2-TI2V-5B