WanX-Video-1 commited on
Commit
e6aede1
·
1 Parent(s): 78120a0

init upload

Browse files
.gitattributes CHANGED
@@ -35,11 +35,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
- assets/.DS_Store filter=lfs diff=lfs merge=lfs -text
39
  assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
40
  assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
41
  assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
42
- assets/input.png filter=lfs diff=lfs merge=lfs -text
43
  assets/logo.png filter=lfs diff=lfs merge=lfs -text
44
  assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
45
  assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
38
  assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
39
  assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
40
  assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
 
41
  assets/logo.png filter=lfs diff=lfs merge=lfs -text
42
  assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
43
  assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -5,12 +5,12 @@
5
  <p>
6
 
7
  <p align="center">
8
- 💜 <a href=""><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
9
  <br>
10
 
11
  -----
12
 
13
- [**Wan: Open and Advanced Large-Scale Video Generative Models**]("#") <be>
14
 
15
  In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
16
  - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
@@ -72,10 +72,10 @@ pip install -r requirements.txt
72
 
73
  | Models | Download Link | Notes |
74
  | --------------|-------------------------------------------------------------------------------|-------------------------------|
75
- | T2V-14B | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | Supports both 480P and 720P
76
- | I2V-14B-720P | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) | Supports 720P
77
- | I2V-14B-480P | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) | Supports 480P
78
- | T2V-1.3B | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) | Supports 480P
79
 
80
  > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
81
 
@@ -83,7 +83,7 @@ pip install -r requirements.txt
83
  Download models using huggingface-cli:
84
  ```
85
  pip install "huggingface_hub[cli]"
86
- huggingface-cli download --resume-download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./Wan2.1-I2V-14B-720P
87
  ```
88
 
89
 
@@ -126,6 +126,7 @@ Similar to Text-to-Video, Image-to-Video is also divided into processes with and
126
  python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
127
  ```
128
 
 
129
 
130
  - Multi-GPU inference using FSDP + xDiT USP
131
 
@@ -137,8 +138,6 @@ torchrun --nproc_per_node=8 generate.py --task i2v-14B --size 1280*720 --ckpt_di
137
  ##### (2) Using Prompt Extention
138
 
139
 
140
- The process of prompt extension can be referenced [here](#2-using-prompt-extention).
141
-
142
  Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
143
  ```
144
  python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
@@ -228,7 +227,7 @@ We curated and deduplicated a candidate dataset comprising a vast amount of imag
228
 
229
 
230
  ##### Comparisons to SOTA
231
- We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. Then we calculated the total score through a weighted average based on the importance of each dimension. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models.
232
 
233
  ![figure1](assets/vben_vs_sota.png "figure1")
234
 
@@ -251,9 +250,9 @@ The models in this repository are licensed under the Apache 2.0 License. We clai
251
 
252
  ## Acknowledgements
253
 
254
- We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [QWen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research and exploration.
255
 
256
 
257
 
258
  ## Contact Us
259
- If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups]()!
 
5
  <p>
6
 
7
  <p align="center">
8
+ 💜 <a href=""><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wanxai.com">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
9
  <br>
10
 
11
  -----
12
 
13
+ [**Wan: Open and Advanced Large-Scale Video Generative Models**]() <be>
14
 
15
  In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
16
  - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
 
72
 
73
  | Models | Download Link | Notes |
74
  | --------------|-------------------------------------------------------------------------------|-------------------------------|
75
+ | T2V-14B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B) | Supports both 480P and 720P
76
+ | I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | Supports 720P
77
+ | I2V-14B-480P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P) | Supports 480P
78
+ | T2V-1.3B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) | Supports 480P
79
 
80
  > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
81
 
 
83
  Download models using huggingface-cli:
84
  ```
85
  pip install "huggingface_hub[cli]"
86
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./Wan2.1-I2V-14B-720P
87
  ```
88
 
89
 
 
126
  python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
127
  ```
128
 
129
+ > 💡For the Image-to-Video task, the `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.
130
 
131
  - Multi-GPU inference using FSDP + xDiT USP
132
 
 
138
  ##### (2) Using Prompt Extention
139
 
140
 
 
 
141
  Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
142
  ```
143
  python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
 
227
 
228
 
229
  ##### Comparisons to SOTA
230
+ We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. We then compute the total score by performing a weighted calculation on the scores of each dimension, utilizing weights derived from human preferences in the matching process. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models.
231
 
232
  ![figure1](assets/vben_vs_sota.png "figure1")
233
 
 
250
 
251
  ## Acknowledgements
252
 
253
+ We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research.
254
 
255
 
256
 
257
  ## Contact Us
258
+ If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups](https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg)!
assets/.DS_Store DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d65165279105ca6773180500688df4bdc69a2c7b771752f0a46ef120b7fd8ec3
3
- size 6148
 
 
 
 
assets/comp_effic.png CHANGED

Git LFS Details

  • SHA256: b1b23457157a494ebe834306962e927768830e26a2d51b896929d2d7cba54dd6
  • Pointer size: 132 Bytes
  • Size of remote file: 1.6 MB

Git LFS Details

  • SHA256: b0e225caffb4b31295ad150f95ee852e4c3dde4a00ac8f79a2ff500f2ce26b8d
  • Pointer size: 132 Bytes
  • Size of remote file: 1.79 MB
assets/input.png DELETED

Git LFS Details

  • SHA256: da5825447ffdefe9728c0e99caf7724a258c79d9afc0e4ec47421f16b4bc27b7
  • Pointer size: 132 Bytes
  • Size of remote file: 1.07 MB
assets/vben_vs_sota.png CHANGED

Git LFS Details

  • SHA256: d32d27b128f46b6d3abe3cdaec4966629fcb86ae7658679ed1c985eec8541c4b
  • Pointer size: 131 Bytes
  • Size of remote file: 584 kB

Git LFS Details

  • SHA256: 9a0e86ca85046d2675f97984b88b6e74df07bba8a62a31ab8a1aef50d4eda44e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.55 MB
assets/video_vae_res.jpg CHANGED

Git LFS Details

  • SHA256: 4e98374a200c3a0b3a4d1322d4d3dfe33ff62019812a6338c947cfd21efbfc5f
  • Pointer size: 131 Bytes
  • Size of remote file: 212 kB

Git LFS Details

  • SHA256: d8f9e7f7353848056a615c8ef35ab86ec22976bb46cb27405008b4089701945c
  • Pointer size: 131 Bytes
  • Size of remote file: 213 kB