ai-forever commited on
Commit
4ad85e9
1 Parent(s): f0d4c1c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -0
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Kandinsky Video — a new text-to-video generation model
5
+ ## SoTA quality among open-source solutions
6
+
7
+ This repository is the official implementation of Kandinsky Video model
8
+
9
+
10
+ Paper | [Project](https://ai-forever.github.io/kandinsky-video/) | ![Hugging Face Spaces](https://img.shields.io/badge/🤗-Huggingface-yello.svg) | [Telegram-bot](https://t.me/video_kandinsky_bot) | Habr post
11
+
12
+
13
+ <p align="center">
14
+ <img src="__assets__/title.JPG" width="800px"/>
15
+ <br>
16
+ <em>Kandinsky Video is a text-to-video generation model, which is based on the FusionFrames architecture, consisting of two main stages: keyframe generation and interpolation. Our approach for temporal conditioning allows us to generate videos with high-quality appearance, smoothness and dynamics.</em>
17
+ </p>
18
+
19
+
20
+
21
+ ## Pipeline
22
+
23
+ <p align="center">
24
+ <img src="__assets__/pipeline.jpg" width="800px"/>
25
+ <br>
26
+ <em>The encoded text prompt enters the U-Net keyframe generation model with temporal layers or blocks, and then the sampled latent keyframes are sent to the latent interpolation model in such a way as to predict three interpolation frames between two keyframes. A temporal MoVQ-GAN decoder is used to get the final video result.</em>
27
+ </p>
28
+
29
+
30
+ **Architecture details**
31
+
32
+ + Text encoder (Flan-UL2) - 8.6B
33
+ + Latent Diffusion U-Net3D - 4.0B
34
+ + MoVQ encoder/decoder - 256M
35
+
36
+
37
+ ## How to use
38
+
39
+ Check our jupyter notebooks with examples in `./examples` folder
40
+ ### 1. text2video
41
+
42
+ ```python
43
+ from video_kandinsky3 import get_T2V_pipeline
44
+
45
+ t2v_pipe = get_T2V_pipeline('cuda', fp16=True)
46
+
47
+ pfps = 'medium' # ['low', 'medium', 'high']
48
+ video = t2v_pipe(
49
+ 'a red car is drifting on the mountain road, close view, fast movement',
50
+ width=640, height=384, fps=fps
51
+ )
52
+ ```
53
+
54
+
55
+ ## Results
56
+
57
+
58
+ <table class="center">
59
+ <tr>
60
+ <td><img src="__assets__/results/A car moving on the road from the sea to the mountains.gif" raw=true></td>
61
+ <td><img src="__assets__/results/A red car drifting, 4k video.gif"></td>
62
+ <td><img src="__assets__/results/chemistry laboratory, chemical explosion, 4k.gif"></td>
63
+ <td><img src="__assets__/results/Erupting volcano_ raw power, molten lava, and the forces of the Earth.gif"></td>
64
+ </tr>
65
+ <tr>
66
+ <td width=25% align="center">"A car moving on the road from the sea to the mountains"</td>
67
+ <td width=25% align="center">"A red car drifting, 4k video"</td>
68
+ <td width=25% align="center">"Chemistry laboratory, chemical explosion, 4k"</td>
69
+ <td width=25% align="center">"Erupting volcano raw power, molten lava, and the forces of the Earth"</td>
70
+ </tr>
71
+
72
+ <tr>
73
+ <td><img src="__assets__/results/luminescent jellyfish swims underwater, neon, 4k.gif" raw=true></td>
74
+ <td><img src="__assets__/results/Majestic waterfalls in a lush rainforest_ power, mist, and biodiversity.gif"></td>
75
+ <td><img src="__assets__/results/white ghost flies through a night clearing, 4k.gif"></td>
76
+ <td><img src="__assets__/results/Wildlife migration_ herds on the move, crossing landscapes in harmony.gif"></td>
77
+ </tr>
78
+ <tr>
79
+ <td width=25% align="center">"Luminescent jellyfish swims underwater, neon, 4k"</td>
80
+ <td width=25% align="center">"Majestic waterfalls in a lush rainforest power, mist, and biodiversity"</td>
81
+ <td width=25% align="center">"White ghost flies through a night clearing, 4k"</td>
82
+ <td width=25% align="center">"Wildlife migration herds on the move, crossing landscapes in harmony"</td>
83
+ </tr>
84
+
85
+ <tr>
86
+ <td><img src="__assets__/results/Majestic humpback whale breaching_ power, grace, and ocean spectacle.gif" raw=true></td>
87
+ <td><img src="__assets__/results/Evoke the sense of wonder in a time-lapse journey through changing seasons..gif"></td>
88
+ <td><img src="__assets__/results/Explore the fascinating world of underwater creatures in a visually stunning sequence.gif"></td>
89
+ <td><img src="__assets__/results/Polar ice caps_ the pristine wilderness of the Arctic and Antarctic.gif"></td>
90
+ </tr>
91
+ <tr>
92
+ <td width=25% align="center">"Majestic humpback whale breaching power, grace, and ocean spectacle"</td>
93
+ <td width=25% align="center">"Evoke the sense of wonder in a time-lapse journey through changing seasons"</td>
94
+ <td width=25% align="center">"Explore the fascinating world of underwater creatures in a visually stunning sequence"</td>
95
+ <td width=25% align="center">"Polar ice caps the pristine wilderness of the Arctic and Antarctic"</td>
96
+ </tr>
97
+
98
+
99
+ <tr>
100
+ <td><img src="__assets__/results/Rolling waves on a sandy beach_ relaxation, rhythm, and coastal beauty.gif" raw=true></td>
101
+ <td><img src="__assets__/results/Sloth in slow motion_ deliberate movements, relaxation, and arboreal life.gif"></td>
102
+ <td><img src="__assets__/results/Time-lapse of a flower blooming_ growth, beauty, and the passage of time..gif"></td>
103
+ <td><img src="__assets__/results/Craft a heartwarming narrative showcasing the bond between a human and their loyal pet companion..gif"></td>
104
+ </tr>
105
+ <tr>
106
+ <td width=25% align="center">"Rolling waves on a sandy beach relaxation, rhythm, and coastal beauty"</td>
107
+ <td width=25% align="center">"Sloth in slow motion deliberate movements, relaxation, and arboreal life"</td>
108
+ <td width=25% align="center">"Time-lapse of a flower blooming growth, beauty, and the passage of time"</td>
109
+ <td width=25% align="center">"Craft a heartwarming narrative showcasing the bond between a human and their loyal pet companion"</td>
110
+ </tr>
111
+
112
+
113
+ </table>
114
+
115
+
116
+ # Authors
117
+
118
+ + Vladimir Arkhipkin: [Github](https://github.com/oriBetelgeuse), [Google Scholar](https://scholar.google.com/citations?user=D-Ko0oAAAAAJ&hl=ru)
119
+ + Zein Shaheen: [Github](https://github.com/zeinsh), [Google Scholar](https://scholar.google.ru/citations?user=bxlgMxMAAAAJ&hl=en)
120
+ + Viacheslav Vasilev: [Github](https://github.com/vivasilev), [Google Scholar](https://scholar.google.com/citations?user=redAz-kAAAAJ&hl=ru&oi=sra)
121
+ + Igor Pavlov: [Github](https://github.com/boomb0om)
122
+ + Elizaveta Dakhova: [Github](https://github.com/LizaDakhova)
123
+ + Anastasia Lysenko: [Github](https://github.com/LysenkoAnastasia)
124
+ + Sergey Markov
125
+ + Denis Dimitrov: [Github](https://github.com/denndimitrov), [Google Scholar](https://scholar.google.com/citations?user=3JSIJpYAAAAJ&hl=ru&oi=ao)
126
+ + Andrey Kuznetsov: [Github](https://github.com/kuznetsoffandrey), [Google Scholar](https://scholar.google.com/citations?user=q0lIfCEAAAAJ&hl=ru)
127
+
128
+
129
+ ## BibTeX
130
+ If you use our work in your research, please cite our publication:
131
+ ```
132
+ TBD
133
+ ```