ai-forever commited on
Commit
56922e3
·
verified ·
1 Parent(s): cf5a89a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -3,6 +3,22 @@ license: apache-2.0
3
  ---
4
  # Kandinsky-4 flash: Text-to-Video diffusion model
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
7
  <tr>
8
  <td>
@@ -44,8 +60,6 @@ license: apache-2.0
44
 
45
 
46
 
47
- [Kandinsky 4.0 Post]() | [Project Page]() | [Generate]() | [Telegram-bot]() | [Technical Report]() | [GitHub](https://github.com/ai-forever/Kandinsky-4) | [HuggingFace](https://huggingface.co/ai-forever/kandinsky4) |
48
-
49
  ## Description:
50
 
51
  Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.
 
3
  ---
4
  # Kandinsky-4 flash: Text-to-Video diffusion model
5
 
6
+ <br><br><br><br>
7
+
8
+ <div align="center">
9
+ <image src="https://github.com/ai-forever/Kandinsky-4/assets/KANDINSKY_LOGO_1_BLACK.png" ></image>
10
+ </div>
11
+
12
+ <div align="center">
13
+ <a>Kandinsky 4.0 Post</a> | <a>Project Page</a> | <a>Generate</a> | <a>Telegram-bot</a> | <a>Technical Report</a> | <a href=https://github.com/ai-forever/Kandinsky-4>GitHub</a> | <a href=https://huggingface.co/ai-forever/kandinsky4>HuggingFace</a>
14
+ </div>
15
+
16
+ <div align="center">
17
+ This repository is the official implementation of Kandinsky-4 flash and Kandinsky-4 Audio.
18
+ </div>
19
+
20
+ <br><br><br><br>
21
+
22
  <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
23
  <tr>
24
  <td>
 
60
 
61
 
62
 
 
 
63
  ## Description:
64
 
65
  Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.