JunhaoZhuang commited on
Commit
3a3609d
Β·
verified Β·
1 Parent(s): 47526aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -3
README.md CHANGED
@@ -1,3 +1,170 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # ⚑ FlashVSR
5
+
6
+ **Towards Real-Time Diffusion-Based Streaming Video Super-Resolution**
7
+
8
+ **Authors:** Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue
9
+
10
+ <a href='http://zhuang2002.github.io/FlashVSR'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
11
+ <a href="https://github.com/OpenImagingLab/FlashVSR"><img src="https://img.shields.io/badge/GitHub-Repository-black?logo=github"></a> &nbsp;
12
+ <a href="https://huggingface.co/JunhaoZhuang/FlashVSR"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a> &nbsp;
13
+ <a href="https://huggingface.co/datasets/JunhaoZhuang/VSR-120K"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-orange"></a> &nbsp;
14
+ <a href="#"><img src="https://img.shields.io/badge/arXiv-TBD-b31b1b.svg"></a>
15
+
16
+ **Your star means a lot for us to develop this project!** :star:
17
+
18
+ <img src="https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/assert/teaser.png" />
19
+
20
+ ---
21
+
22
+ ### 🌟 Abstract
23
+
24
+ Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving **efficiency, scalability, and real-time performance**. To this end, we propose **FlashVSR**, the first diffusion-based one-step streaming framework towards real-time VSR. **FlashVSR runs at ∼17 FPS for 768 Γ— 1408 videos on a single A100 GPU** by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train–test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct **VSR-120K**, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves **state-of-the-art performance with up to ∼12Γ— speedup** over prior one-step diffusion VSR models.
25
+
26
+ ---
27
+
28
+ ### πŸ“° News
29
+
30
+ - **Release Date:** October 2025 β€” Inference code and model weights are available now! πŸŽ‰
31
+ - **Coming Soon:** Dataset release (**VSR-120K**) for large-scale training.
32
+
33
+ ---
34
+
35
+ ### πŸ“‹ TODO
36
+
37
+ - βœ… Release inference code and model weights
38
+ - ⬜ Release dataset (VSR-120K)
39
+
40
+ ---
41
+
42
+ ### πŸš€ Getting Started
43
+
44
+ Follow these steps to set up and run **FlashVSR** on your local machine:
45
+
46
+ #### 1️⃣ Clone the Repository
47
+
48
+ ```bash
49
+ git clone https://github.com/OpenImagingLab/FlashVSR
50
+ cd FlashVSR
51
+ ````
52
+
53
+ #### 2️⃣ Set Up the Python Environment
54
+
55
+ Create and activate the environment (**Python 3.11.13**):
56
+
57
+ ```bash
58
+ conda create -n flashvsr python=3.11.13
59
+ conda activate flashvsr
60
+ ```
61
+
62
+ Install project dependencies:
63
+
64
+ ```bash
65
+ pip install -e .
66
+ pip install -r requirements.txt
67
+ ```
68
+
69
+ #### 3️⃣ Install Block-Sparse Attention (Required)
70
+
71
+ FlashVSR **requires** the **Block-Sparse Attention** backend for inference:
72
+
73
+ ```bash
74
+ git clone https://github.com/mit-han-lab/Block-Sparse-Attention
75
+ cd Block-Sparse-Attention
76
+ pip install packaging
77
+ pip install ninja
78
+ python setup.py install
79
+ ```
80
+
81
+ #### 4️⃣ Download Model Weights from Hugging Face
82
+
83
+ Weights are hosted on **Hugging Face** via **Git LFS**. Please install Git LFS first:
84
+
85
+ ```bash
86
+ # From the repo root
87
+ cd examples/WanVSR
88
+
89
+ # Install Git LFS (once per machine)
90
+ git lfs install
91
+
92
+ # Clone the model repository into examples/WanVSR
93
+ git lfs clone https://huggingface.co/JunhaoZhuang/FlashVSR
94
+ ```
95
+
96
+ After cloning, you should have:
97
+
98
+ ```
99
+ ./examples/WanVSR/FlashVSR/
100
+ β”‚
101
+ β”œβ”€β”€ LQ_proj_in.ckpt
102
+ β”œβ”€β”€ TCDecoder.ckpt
103
+ β”œβ”€β”€ Wan2.1_VAE.pth
104
+ β”œβ”€β”€ diffusion_pytorch_model_streaming_dmd.safetensors
105
+ └── README.md
106
+ ```
107
+
108
+ > The inference scripts will load weights from `./examples/WanVSR/FlashVSR/` by default.
109
+
110
+ #### 5️⃣ Run Inference
111
+
112
+ ```bash
113
+ # From the repo root
114
+ cd examples/WanVSR
115
+ python infer_flashvsr_full.py # Full model
116
+ # or
117
+ python infer_flashvsr_tiny.py # Tiny model
118
+ ```
119
+
120
+ ---
121
+
122
+ ### πŸ› οΈ Method
123
+
124
+ The overview of **FlashVSR**. This framework features:
125
+
126
+ * **Three-Stage Distillation Pipeline** for streaming VSR training.
127
+ * **Locality-Constrained Sparse Attention** to cut redundant computation and bridge the train–test resolution gap.
128
+ * **Tiny Conditional Decoder** for efficient, high-quality reconstruction.
129
+ * **VSR-120K Dataset** consisting of **120k videos** and **180k images**, supports joint training on both images and videos.
130
+
131
+ <img src="https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/assert/flowchart.jpg" width="1000" />
132
+
133
+ ---
134
+
135
+ ### πŸ€— Feedback & Support
136
+
137
+ We welcome feedback and issues. Thank you for trying **FlashVSR**!
138
+
139
+ ---
140
+
141
+ ### πŸ“„ Acknowledgments
142
+
143
+ We gratefully acknowledge the following open-source projects:
144
+
145
+ * **DiffSynth Studio** β€” [https://github.com/modelscope/DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
146
+ * **Block-Sparse-Attention** β€” [https://github.com/mit-han-lab/Block-Sparse-Attention](https://github.com/mit-han-lab/Block-Sparse-Attention)
147
+ * **taehv** β€” [https://github.com/madebyollin/taehv](https://github.com/madebyollin/taehv)
148
+
149
+ ---
150
+
151
+ ### πŸ“ž Contact
152
+
153
+ * **Junhao Zhuang**
154
+ Email: [zhuangjh23@mails.tsinghua.edu.cn](mailto:zhuangjh23@mails.tsinghua.edu.cn)
155
+
156
+ ---
157
+
158
+ ### πŸ“œ Citation
159
+
160
+ ```bibtex
161
+ @misc{zhuang2025flashvsr,
162
+ title={FLASHVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution},
163
+ author={Junhao Zhuang and Shi Guo and Xin Cai and Xiaohui Li and Yihao Liu and Chun Yuan and Tianfan Xue},
164
+ year={2025},
165
+ archivePrefix={arXiv},
166
+ primaryClass={cs.CV},
167
+ note={},
168
+ url={http://zhuang2002.github.io/FlashVSR}
169
+ }
170
+ ```