ai-forever
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,22 @@ license: apache-2.0
|
|
3 |
---
|
4 |
# Kandinsky-4 flash: Text-to-Video diffusion model
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
<table border="0" style="width: 200; text-align: left; margin-top: 20px;">
|
7 |
<tr>
|
8 |
<td>
|
@@ -44,8 +60,6 @@ license: apache-2.0
|
|
44 |
|
45 |
|
46 |
|
47 |
-
[Kandinsky 4.0 Post]() | [Project Page]() | [Generate]() | [Telegram-bot]() | [Technical Report]() | [GitHub](https://github.com/ai-forever/Kandinsky-4) | [HuggingFace](https://huggingface.co/ai-forever/kandinsky4) |
|
48 |
-
|
49 |
## Description:
|
50 |
|
51 |
Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.
|
|
|
3 |
---
|
4 |
# Kandinsky-4 flash: Text-to-Video diffusion model
|
5 |
|
6 |
+
<br><br><br><br>
|
7 |
+
|
8 |
+
<div align="center">
|
9 |
+
<image src="https://github.com/ai-forever/Kandinsky-4/assets/KANDINSKY_LOGO_1_BLACK.png" ></image>
|
10 |
+
</div>
|
11 |
+
|
12 |
+
<div align="center">
|
13 |
+
<a>Kandinsky 4.0 Post</a> | <a>Project Page</a> | <a>Generate</a> | <a>Telegram-bot</a> | <a>Technical Report</a> | <a href=https://github.com/ai-forever/Kandinsky-4>GitHub</a> | <a href=https://huggingface.co/ai-forever/kandinsky4>HuggingFace</a>
|
14 |
+
</div>
|
15 |
+
|
16 |
+
<div align="center">
|
17 |
+
This repository is the official implementation of Kandinsky-4 flash and Kandinsky-4 Audio.
|
18 |
+
</div>
|
19 |
+
|
20 |
+
<br><br><br><br>
|
21 |
+
|
22 |
<table border="0" style="width: 200; text-align: left; margin-top: 20px;">
|
23 |
<tr>
|
24 |
<td>
|
|
|
60 |
|
61 |
|
62 |
|
|
|
|
|
63 |
## Description:
|
64 |
|
65 |
Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.
|