TechCarbasa commited on
Commit
9e08893
Β·
verified Β·
1 Parent(s): a04a20a

Upload /workspace/ComfyUI/models/FlashVSR-v1.1/README.md with huggingface_hub

Browse files
workspace/ComfyUI/models/FlashVSR-v1.1/README.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: video-to-video
4
+ ---
5
+
6
+ # ⚑ FlashVSR
7
+
8
+ **Towards Real-Time Diffusion-Based Streaming Video Super-Resolution**
9
+
10
+ **Authors:** Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue
11
+
12
+ <a href='http://zhuang2002.github.io/FlashVSR'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
13
+ <a href="https://github.com/OpenImagingLab/FlashVSR"><img src="https://img.shields.io/badge/GitHub-Repository-black?logo=github"></a> &nbsp;
14
+ <a href="https://huggingface.co/JunhaoZhuang/FlashVSR"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20(v1)-blue"></a> &nbsp;
15
+ <a href="https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20(v1.1)-blue"></a> &nbsp;
16
+ <a href="https://huggingface.co/datasets/JunhaoZhuang/VSR-120K"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-orange"></a> &nbsp;
17
+ <a href="https://arxiv.org/abs/2510.12747"><img src="https://img.shields.io/badge/arXiv-2510.12747-b31b1b.svg"></a>
18
+
19
+ **Your star means a lot for us to develop this project!** :star:
20
+
21
+ <img src="https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/assets/teaser.png" />
22
+
23
+ ---
24
+
25
+ ### 🌟 Abstract
26
+
27
+ Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving **efficiency, scalability, and real-time performance**. To this end, we propose **FlashVSR**, the first diffusion-based one-step streaming framework towards real-time VSR. **FlashVSR runs at ∼17 FPS for 768 Γ— 1408 videos on a single A100 GPU** by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train–test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct **VSR-120K**, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves **state-of-the-art performance with up to ∼12Γ— speedup** over prior one-step diffusion VSR models.
28
+
29
+ ---
30
+
31
+ ### πŸ“° News
32
+
33
+ - **Nov 2025 β€” πŸŽ‰ [FlashVSR v1.1](https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1) released:** enhanced stability + fidelity
34
+ - **Oct 2025 β€” [FlashVSR v1](https://huggingface.co/JunhaoZhuang/FlashVSR) (initial release)**: Inference code and model weights are available now! πŸŽ‰
35
+ - **Bug Fix (October 21, 2025):** Fixed `local_attention_mask` update logic to prevent artifacts when switching between different aspect ratios during continuous inference.
36
+ - **Coming Soon:** Dataset release (**VSR-120K**) for large-scale training.
37
+
38
+ ---
39
+
40
+ ### πŸ“’ Important Quality Note (ComfyUI & other third-party implementations)
41
+
42
+ First of all, huge thanks to the community for the fast adoption, feedback, and contributions to FlashVSR! πŸ™Œ
43
+ During community testing, we noticed that some third-party implementations of FlashVSR (e.g. early ComfyUI versions) do **not include our Locality-Constrained Sparse Attention (LCSA)** module and instead fall back to **dense attention**. This may lead to **noticeable quality degradation**, especially at higher resolutions.
44
+ Community discussion: https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1441
45
+
46
+ Below is a comparison example provided by a community member:
47
+
48
+ | Fig.1 – LR Input Video | Fig.2 – 3rd-party (no LCSA) | Fig.3 – Official FlashVSR |
49
+ |------------------|-----------------------------------------------|--------------------------------------|
50
+ | <video src="https://github.com/user-attachments/assets/ea12a191-48d5-47c0-a8e5-e19ad13581a9" controls width="260"></video> | <video src="https://github.com/user-attachments/assets/c8e53bd5-7eca-420d-9cc6-2b9c06831047" controls width="260"></video> | <video src="https://github.com/user-attachments/assets/a4d80618-d13d-4346-8e37-38d2fabf9827" controls width="260"></video> |
51
+
52
+ βœ… The **official FlashVSR pipeline (this repository)**:
53
+ - **Better preserves fine structures and details**
54
+ - **Effectively avoids texture aliasing and visual artifacts**
55
+
56
+ We are also working on a **version that does not rely on the Block-Sparse Attention library** while keeping **the same output quality**; this alternative may run slower than the optimized original implementation.
57
+
58
+ Thanks again to the community for actively testing and helping improve FlashVSR together! πŸš€
59
+
60
+ ---
61
+
62
+ ### πŸ“‹ TODO
63
+
64
+ - βœ… Release inference code and model weights
65
+ - ⬜ Release dataset (VSR-120K)
66
+
67
+ ---
68
+
69
+ ### πŸš€ Getting Started
70
+
71
+ Follow these steps to set up and run **FlashVSR** on your local machine:
72
+
73
+ > ⚠️ **Note:** This project is primarily designed and optimized for **4Γ— video super-resolution**.
74
+ > We **strongly recommend** using the **4Γ— SR setting** to achieve better results and stability. βœ…
75
+
76
+ #### 1️⃣ Clone the Repository
77
+
78
+ ```bash
79
+ git clone https://github.com/OpenImagingLab/FlashVSR
80
+ cd FlashVSR
81
+ ````
82
+
83
+ #### 2️⃣ Set Up the Python Environment
84
+
85
+ Create and activate the environment (**Python 3.11.13**):
86
+
87
+ ```bash
88
+ conda create -n flashvsr python=3.11.13
89
+ conda activate flashvsr
90
+ ```
91
+
92
+ Install project dependencies:
93
+
94
+ ```bash
95
+ pip install -e .
96
+ pip install -r requirements.txt
97
+ ```
98
+
99
+ #### 3️⃣ Install Block-Sparse Attention (Required)
100
+
101
+ FlashVSR relies on the **Block-Sparse Attention** backend to enable flexible and dynamic attention masking for efficient inference.
102
+
103
+ > **⚠️ Note:**
104
+ >
105
+ > * The Block-Sparse Attention build process can be memory-intensive, especially when compiling in parallel with multiple `ninja` jobs. It is recommended to keep sufficient memory available during compilation to avoid OOM errors. Once the build is complete, runtime memory usage is stable and not an issue.
106
+ > * Based on our testing, the Block-Sparse Attention backend works correctly on **NVIDIA A100 and A800** (Ampere) with **ideal acceleration performance**, and it also runs correctly on **H200** (Hopper) but the acceleration is limited due to hardware scheduling differences and sparse kernel behavior. **Compatibility and performance on other GPUs (e.g., RTX 40/50 series or H800) are currently unknown**. For more details, please refer to the official documentation: https://github.com/mit-han-lab/Block-Sparse-Attention
107
+
108
+
109
+ ```bash
110
+ # βœ… Recommended: clone and install in a separate clean folder (outside the FlashVSR repo)
111
+ git clone https://github.com/mit-han-lab/Block-Sparse-Attention
112
+ cd Block-Sparse-Attention
113
+ pip install packaging
114
+ pip install ninja
115
+ python setup.py install
116
+ ```
117
+
118
+ #### 4️⃣ Download Model Weights from Hugging Face
119
+
120
+ FlashVSR provides both **v1** and **v1.1** model weights on Hugging Face (via **Git LFS**).
121
+ Please install Git LFS first:
122
+
123
+ ```bash
124
+ # From the repo root
125
+ cd examples/WanVSR
126
+
127
+ # Install Git LFS (once per machine)
128
+ git lfs install
129
+
130
+ # Clone v1 (original) or v1.1 (recommended)
131
+ git lfs clone https://huggingface.co/JunhaoZhuang/FlashVSR # v1
132
+ # or
133
+ git lfs clone https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1 # v1.1
134
+ ```
135
+
136
+ After cloning, you should have one of the following folders:
137
+
138
+ ```
139
+ ./examples/WanVSR/FlashVSR/ # v1
140
+ ./examples/WanVSR/FlashVSR-v1.1/ # v1.1
141
+ β”‚
142
+ β”œβ”€β”€ LQ_proj_in.ckpt
143
+ β”œβ”€β”€ TCDecoder.ckpt
144
+ β”œβ”€β”€ Wan2.1_VAE.pth
145
+ β”œβ”€β”€ diffusion_pytorch_model_streaming_dmd.safetensors
146
+ └── README.md
147
+ ```
148
+
149
+ > Inference scripts automatically load weights from the corresponding folder.
150
+
151
+ ---
152
+
153
+ #### 5️⃣ Run Inference
154
+
155
+ ```bash
156
+ # From the repo root
157
+ cd examples/WanVSR
158
+
159
+ # v1 (original)
160
+ python infer_flashvsr_full.py
161
+ # or
162
+ python infer_flashvsr_tiny.py
163
+ # or
164
+ python infer_flashvsr_tiny_long_video.py
165
+
166
+ # v1.1 (recommended)
167
+ python infer_flashvsr_v1.1_full.py
168
+ # or
169
+ python infer_flashvsr_v1.1_tiny.py
170
+ # or
171
+ python infer_flashvsr_v1.1_tiny_long_video.py
172
+ ```
173
+
174
+ ---
175
+
176
+ ### πŸ› οΈ Method
177
+
178
+ The overview of **FlashVSR**. This framework features:
179
+
180
+ * **Three-Stage Distillation Pipeline** for streaming VSR training.
181
+ * **Locality-Constrained Sparse Attention** to cut redundant computation and bridge the train–test resolution gap.
182
+ * **Tiny Conditional Decoder** for efficient, high-quality reconstruction.
183
+ * **VSR-120K Dataset** consisting of **120k videos** and **180k images**, supports joint training on both images and videos.
184
+
185
+ <img src="https://raw.githubusercontent.com/OpenImagingLab/FlashVSR/main/examples/WanVSR/assets/flowchart.jpg" width="1000" />
186
+
187
+ ---
188
+
189
+ ### πŸ€— Feedback & Support
190
+
191
+ We welcome feedback and issues. Thank you for trying **FlashVSR**!
192
+
193
+ ---
194
+
195
+ ### πŸ“„ Acknowledgments
196
+
197
+ We gratefully acknowledge the following open-source projects:
198
+
199
+ * **DiffSynth Studio** β€” [https://github.com/modelscope/DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
200
+ * **Block-Sparse-Attention** β€” [https://github.com/mit-han-lab/Block-Sparse-Attention](https://github.com/mit-han-lab/Block-Sparse-Attention)
201
+ * **taehv** β€” [https://github.com/madebyollin/taehv](https://github.com/madebyollin/taehv)
202
+
203
+ ---
204
+
205
+ ### πŸ“ž Contact
206
+
207
+ * **Junhao Zhuang**
208
+ Email: [zhuangjh23@mails.tsinghua.edu.cn](mailto:zhuangjh23@mails.tsinghua.edu.cn)
209
+
210
+ ---
211
+
212
+ ### πŸ“œ Citation
213
+
214
+ ```bibtex
215
+ @article{zhuang2025flashvsr,
216
+ title={FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution},
217
+ author={Zhuang, Junhao and Guo, Shi and Cai, Xin and Li, Xiaohui and Liu, Yihao and Yuan, Chun and Xue, Tianfan},
218
+ journal={arXiv preprint arXiv:2510.12747},
219
+ year={2025}
220
+ }
221
+ ```