init upload

Browse files

Files changed (7) hide show

.gitattributes +0 -2
README.md +11 -12
assets/.DS_Store +0 -3
assets/comp_effic.png +2 -2
assets/input.png +0 -3
assets/vben_vs_sota.png +2 -2
assets/video_vae_res.jpg +2 -2

.gitattributes CHANGED Viewed

@@ -35,11 +35,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
-assets/.DS_Store filter=lfs diff=lfs merge=lfs -text
 assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
 assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
 assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
-assets/input.png filter=lfs diff=lfs merge=lfs -text
 assets/logo.png filter=lfs diff=lfs merge=lfs -text
 assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
 assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
 assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
 assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
 assets/logo.png filter=lfs diff=lfs merge=lfs -text
 assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
 assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -5,12 +5,12 @@
 <p>
 <p align="center">
-    💜 <a href=""><b>Wan</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
 <br>
 -----
-[**Wan: Open and Advanced Large-Scale Video Generative Models**]("#") <be>
 In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
 - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
@@ -72,10 +72,10 @@ pip install -r requirements.txt
 | Models        |                       Download Link                                           |    Notes                      |
 | --------------|-------------------------------------------------------------------------------|-------------------------------|
-| T2V-14B       |      [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)            | Supports both 480P and 720P
-| I2V-14B-720P  |      [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P)       | Supports 720P
-| I2V-14B-480P  |      [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)       | Supports 480P
-| T2V-1.3B      |      [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)           | Supports 480P
 > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
@@ -83,7 +83,7 @@ pip install -r requirements.txt
 Download models using huggingface-cli:
 ```
 pip install "huggingface_hub[cli]"
-huggingface-cli download --resume-download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./Wan2.1-I2V-14B-720P
 ```
@@ -126,6 +126,7 @@ Similar to Text-to-Video, Image-to-Video is also divided into processes with and
 python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
 ```
 - Multi-GPU inference using FSDP + xDiT USP
@@ -137,8 +138,6 @@ torchrun --nproc_per_node=8 generate.py --task i2v-14B --size 1280*720 --ckpt_di
 ##### (2) Using Prompt Extention
-The process of prompt extension can be referenced [here](#2-using-prompt-extention).
 Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
 ```
 python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
@@ -228,7 +227,7 @@ We curated and deduplicated a candidate dataset comprising a vast amount of imag
 ##### Comparisons to SOTA
-We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. Then we calculated the total score through a weighted average based on the importance of each dimension. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models.
 ![figure1](assets/vben_vs_sota.png "figure1")
@@ -251,9 +250,9 @@ The models in this repository are licensed under the Apache 2.0 License. We clai
 ## Acknowledgements
-We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [QWen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research and exploration.
 ## Contact Us
-If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups]()!

 <p>
 <p align="center">
+    💜 <a href=""><b>Wan</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wanxai.com">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
 <br>
 -----
+[**Wan: Open and Advanced Large-Scale Video Generative Models**]() <be>
 In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
 - 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
 | Models        |                       Download Link                                           |    Notes                      |
 | --------------|-------------------------------------------------------------------------------|-------------------------------|
+| T2V-14B       |      🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)      🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B)          | Supports both 480P and 720P
+| I2V-14B-720P  |      🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P)    🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P)     | Supports 720P
+| I2V-14B-480P  |      🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)    🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P)      | Supports 480P
+| T2V-1.3B      |      🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)     🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B)         | Supports 480P
 > 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
 Download models using huggingface-cli:
 ```
 pip install "huggingface_hub[cli]"
+huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./Wan2.1-I2V-14B-720P
 ```
 python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
 ```
+> 💡For the Image-to-Video task, the `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.
 - Multi-GPU inference using FSDP + xDiT USP
 ##### (2) Using Prompt Extention
 Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
 ```
 python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
 ##### Comparisons to SOTA
+We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. We then compute the total score by performing a weighted calculation on the scores of each dimension, utilizing weights derived from human preferences in the matching process. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models.
 ![figure1](assets/vben_vs_sota.png "figure1")
 ## Acknowledgements
+We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research.
 ## Contact Us
+If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups](https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg)!