aiqtech commited on
Commit
1990052
β€’
1 Parent(s): da2eaea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -140
README.md CHANGED
@@ -1,141 +1,141 @@
1
- ---
2
- title: Cinemo
3
- app_file: demo.py
4
- sdk: gradio
5
- sdk_version: 4.37.2
6
- tags:
7
- - Image-2-Video
8
- - LLM
9
- - Large Language Model
10
- short_description: Multimodal Image-to-Video
11
- emoji: πŸŽ₯
12
- colorFrom: green
13
- colorTo: indigo
14
- ---
15
- ## Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models<br><sub>Official PyTorch Implementation</sub>
16
-
17
-
18
- [![Arxiv](https://img.shields.io/badge/Arxiv-b31b1b.svg)](https://arxiv.org/abs/2407.15642)
19
- [![Project Page](https://img.shields.io/badge/Project-Website-blue)](https://maxin-cn.github.io/cinemo_project/)
20
-
21
-
22
- This repo contains pre-trained weights, and sampling code for our paper exploring image animation with motion diffusion models (Cinemo). You can find more visualizations on our [project page](https://maxin-cn.github.io/cinemo_project/).
23
-
24
- In this project, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency and smoothness. To improve motion smoothness, Cinemo learns the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, a structural similarity index-based method is proposed to control the motion intensity. Furthermore, we propose a noise refinement technique based on discrete cosine transformation to ensure temporal consistency. These three methods help Cinemo generate highly consistent, smooth, and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance.
25
-
26
- <div align="center">
27
- <img src="visuals/pipeline.svg">
28
- </div>
29
-
30
- ## News
31
-
32
- - (πŸ”₯ New) Jul. 23, 2024. πŸ’₯ Our paper is released on [arxiv](https://arxiv.org/abs/2407.15642).
33
-
34
- - (πŸ”₯ New) Jun. 2, 2024. πŸ’₯ The inference code is released. The checkpoint can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main).
35
-
36
-
37
- ## Setup
38
-
39
- First, download and set up the repo:
40
-
41
- ```bash
42
- git clone https://github.com/maxin-cn/Cinemo
43
- cd Cinemo
44
- ```
45
-
46
- We provide an [`environment.yml`](environment.yml) file that can be used to create a Conda environment. If you only want
47
- to run pre-trained models locally on CPU, you can remove the `cudatoolkit` and `pytorch-cuda` requirements from the file.
48
-
49
- ```bash
50
- conda env create -f environment.yml
51
- conda activate cinemo
52
- ```
53
-
54
-
55
- ## Animation
56
-
57
- You can sample from our **pre-trained Cinemo models** with [`animation.py`](pipelines/animation.py). Weights for our pre-trained Cinemo model can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main). The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:
58
-
59
- ```bash
60
- bash pipelines/animation.sh
61
- ```
62
-
63
- All related checkpoints will download automatically and then you will get the following results,
64
-
65
- <table style="width:100%; text-align:center;">
66
- <tr>
67
- <td align="center">Input image</td>
68
- <td align="center">Output video</td>
69
- <td align="center">Input image</td>
70
- <td align="center">Output video</td>
71
- </tr>
72
- <tr>
73
- <td align="center"><img src="visuals/animations/people_walking/0.jpg" width="100%"></td>
74
- <td align="center"><img src="visuals/animations/people_walking/people_walking.gif" width="100%"></td>
75
- <td align="center"><img src="visuals/animations/sea_swell/0.jpg" width="100%"></td>
76
- <td align="center"><img src="visuals/animations/sea_swell/sea_swell.gif" width="100%"></td>
77
- </tr>
78
- <tr>
79
- <td align="center" colspan="2">"People Walking"</td>
80
- <td align="center" colspan="2">"Sea Swell"</td>
81
- </tr>
82
- <tr>
83
- <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/0.jpg" width="100%"></td>
84
- <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/girl_dancing_under_the_stars.gif" width="100%"></td>
85
- <td align="center"><img src="visuals/animations/dragon_glowing_eyes/0.jpg" width="100%"></td>
86
- <td align="center"><img src="visuals/animations/dragon_glowing_eyes/dragon_glowing_eyes.gif" width="100%"></td>
87
- </tr>
88
- <tr>
89
- <td align="center" colspan="2">"Girl Dancing under the Stars"</td>
90
- <td align="center" colspan="2">"Dragon Glowing Eyes"</td>
91
- </tr>
92
-
93
- </table>
94
-
95
-
96
- ## Other Applications
97
-
98
- You can also utilize Cinemo for other applications, such as motion transfer and video editing:
99
-
100
- ```bash
101
- bash pipelines/video_editing.sh
102
- ```
103
-
104
- All related checkpoints will download automatically and you will get the following results,
105
-
106
- <table style="width:100%; text-align:center;">
107
- <tr>
108
- <td align="center">Input video</td>
109
- <td align="center">First frame</td>
110
- <td align="center">Edited first frame</td>
111
- <td align="center">Output video</td>
112
- </tr>
113
- <tr>
114
- <td align="center"><img src="visuals/video_editing/origin/a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
115
- <td align="center"><img src="visuals/video_editing/origin/0.jpg" width="100%"></td>
116
- <td align="center"><img src="visuals/video_editing/edit/0.jpg" width="100%"></td>
117
- <td align="center"><img src="visuals/video_editing/edit/editing_a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
118
- </tr>
119
-
120
- </table>
121
-
122
-
123
-
124
- ## Citation
125
- If you find this work useful for your research, please consider citing it.
126
- ```bibtex
127
- @article{ma2024cinemo,
128
- title={Cinemo: Latent Diffusion Transformer for Video Generation},
129
- author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
130
- journal={arXiv preprint arXiv:2407.15642},
131
- year={2024}
132
- }
133
- ```
134
-
135
-
136
- ## Acknowledgments
137
- Cinemo has been greatly inspired by the following amazing works and teams: [LaVie](https://github.com/Vchitect/LaVie) and [SEINE](https://github.com/Vchitect/SEINE), we thank all the contributors for open-sourcing.
138
-
139
-
140
- ## License
141
  The code and model weights are licensed under [LICENSE](LICENSE).
 
1
+ ---
2
+ title: Cinemo
3
+ app_file: demo.py
4
+ sdk: gradio
5
+ sdk_version: 4.42.0
6
+ tags:
7
+ - Image-2-Video
8
+ - LLM
9
+ - Large Language Model
10
+ short_description: Multimodal Image-to-Video
11
+ emoji: πŸŽ₯
12
+ colorFrom: green
13
+ colorTo: indigo
14
+ ---
15
+ ## Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models<br><sub>Official PyTorch Implementation</sub>
16
+
17
+
18
+ [![Arxiv](https://img.shields.io/badge/Arxiv-b31b1b.svg)](https://arxiv.org/abs/2407.15642)
19
+ [![Project Page](https://img.shields.io/badge/Project-Website-blue)](https://maxin-cn.github.io/cinemo_project/)
20
+
21
+
22
+ This repo contains pre-trained weights, and sampling code for our paper exploring image animation with motion diffusion models (Cinemo). You can find more visualizations on our [project page](https://maxin-cn.github.io/cinemo_project/).
23
+
24
+ In this project, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency and smoothness. To improve motion smoothness, Cinemo learns the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, a structural similarity index-based method is proposed to control the motion intensity. Furthermore, we propose a noise refinement technique based on discrete cosine transformation to ensure temporal consistency. These three methods help Cinemo generate highly consistent, smooth, and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance.
25
+
26
+ <div align="center">
27
+ <img src="visuals/pipeline.svg">
28
+ </div>
29
+
30
+ ## News
31
+
32
+ - (πŸ”₯ New) Jul. 23, 2024. πŸ’₯ Our paper is released on [arxiv](https://arxiv.org/abs/2407.15642).
33
+
34
+ - (πŸ”₯ New) Jun. 2, 2024. πŸ’₯ The inference code is released. The checkpoint can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main).
35
+
36
+
37
+ ## Setup
38
+
39
+ First, download and set up the repo:
40
+
41
+ ```bash
42
+ git clone https://github.com/maxin-cn/Cinemo
43
+ cd Cinemo
44
+ ```
45
+
46
+ We provide an [`environment.yml`](environment.yml) file that can be used to create a Conda environment. If you only want
47
+ to run pre-trained models locally on CPU, you can remove the `cudatoolkit` and `pytorch-cuda` requirements from the file.
48
+
49
+ ```bash
50
+ conda env create -f environment.yml
51
+ conda activate cinemo
52
+ ```
53
+
54
+
55
+ ## Animation
56
+
57
+ You can sample from our **pre-trained Cinemo models** with [`animation.py`](pipelines/animation.py). Weights for our pre-trained Cinemo model can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main). The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:
58
+
59
+ ```bash
60
+ bash pipelines/animation.sh
61
+ ```
62
+
63
+ All related checkpoints will download automatically and then you will get the following results,
64
+
65
+ <table style="width:100%; text-align:center;">
66
+ <tr>
67
+ <td align="center">Input image</td>
68
+ <td align="center">Output video</td>
69
+ <td align="center">Input image</td>
70
+ <td align="center">Output video</td>
71
+ </tr>
72
+ <tr>
73
+ <td align="center"><img src="visuals/animations/people_walking/0.jpg" width="100%"></td>
74
+ <td align="center"><img src="visuals/animations/people_walking/people_walking.gif" width="100%"></td>
75
+ <td align="center"><img src="visuals/animations/sea_swell/0.jpg" width="100%"></td>
76
+ <td align="center"><img src="visuals/animations/sea_swell/sea_swell.gif" width="100%"></td>
77
+ </tr>
78
+ <tr>
79
+ <td align="center" colspan="2">"People Walking"</td>
80
+ <td align="center" colspan="2">"Sea Swell"</td>
81
+ </tr>
82
+ <tr>
83
+ <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/0.jpg" width="100%"></td>
84
+ <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/girl_dancing_under_the_stars.gif" width="100%"></td>
85
+ <td align="center"><img src="visuals/animations/dragon_glowing_eyes/0.jpg" width="100%"></td>
86
+ <td align="center"><img src="visuals/animations/dragon_glowing_eyes/dragon_glowing_eyes.gif" width="100%"></td>
87
+ </tr>
88
+ <tr>
89
+ <td align="center" colspan="2">"Girl Dancing under the Stars"</td>
90
+ <td align="center" colspan="2">"Dragon Glowing Eyes"</td>
91
+ </tr>
92
+
93
+ </table>
94
+
95
+
96
+ ## Other Applications
97
+
98
+ You can also utilize Cinemo for other applications, such as motion transfer and video editing:
99
+
100
+ ```bash
101
+ bash pipelines/video_editing.sh
102
+ ```
103
+
104
+ All related checkpoints will download automatically and you will get the following results,
105
+
106
+ <table style="width:100%; text-align:center;">
107
+ <tr>
108
+ <td align="center">Input video</td>
109
+ <td align="center">First frame</td>
110
+ <td align="center">Edited first frame</td>
111
+ <td align="center">Output video</td>
112
+ </tr>
113
+ <tr>
114
+ <td align="center"><img src="visuals/video_editing/origin/a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
115
+ <td align="center"><img src="visuals/video_editing/origin/0.jpg" width="100%"></td>
116
+ <td align="center"><img src="visuals/video_editing/edit/0.jpg" width="100%"></td>
117
+ <td align="center"><img src="visuals/video_editing/edit/editing_a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
118
+ </tr>
119
+
120
+ </table>
121
+
122
+
123
+
124
+ ## Citation
125
+ If you find this work useful for your research, please consider citing it.
126
+ ```bibtex
127
+ @article{ma2024cinemo,
128
+ title={Cinemo: Latent Diffusion Transformer for Video Generation},
129
+ author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
130
+ journal={arXiv preprint arXiv:2407.15642},
131
+ year={2024}
132
+ }
133
+ ```
134
+
135
+
136
+ ## Acknowledgments
137
+ Cinemo has been greatly inspired by the following amazing works and teams: [LaVie](https://github.com/Vchitect/LaVie) and [SEINE](https://github.com/Vchitect/SEINE), we thank all the contributors for open-sourcing.
138
+
139
+
140
+ ## License
141
  The code and model weights are licensed under [LICENSE](LICENSE).