ai-forever
/

kandinsky-4-t2v-flash

Model card Files Files and versions Community

ai-forever commited on about 1 month ago

Commit

56922e3

·

verified ·

1 Parent(s): cf5a89a

Update README.md

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -3,6 +3,22 @@ license: apache-2.0
 ---
 # Kandinsky-4 flash: Text-to-Video diffusion model
 <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -44,8 +60,6 @@ license: apache-2.0
-[Kandinsky 4.0 Post]() | [Project Page]() | [Generate]() | [Telegram-bot]() | [Technical Report]() | [GitHub](https://github.com/ai-forever/Kandinsky-4) | [HuggingFace](https://huggingface.co/ai-forever/kandinsky4) |
 ## Description:
 Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.

 ---
 # Kandinsky-4 flash: Text-to-Video diffusion model
+<br><br><br><br>
+<div align="center">
+  <image src="https://github.com/ai-forever/Kandinsky-4/assets/KANDINSKY_LOGO_1_BLACK.png" ></image>
+</div>
+<div align="center">
+  <a>Kandinsky 4.0 Post</a> | <a>Project Page</a> | <a>Generate</a> | <a>Telegram-bot</a> | <a>Technical Report</a> | <a href=https://github.com/ai-forever/Kandinsky-4>GitHub</a> | <a href=https://huggingface.co/ai-forever/kandinsky4>HuggingFace</a>
+</div>
+<div align="center">
+  This repository is the official implementation of Kandinsky-4 flash and Kandinsky-4 Audio.
+</div>
+<br><br><br><br>
 <table border="0" style="width: 200; text-align: left; margin-top: 20px;">
   <tr>
       <td>
 ## Description:
 Kandinsky 4.0 is a text-to-video generation model based on latent diffusion for 480p and HD resolutions. Here we present distiled version of this model **Kandisnly 4 flash**, that can generate **12 second videos** in 480p resolution in **11 seconds** on a single NVIDIA H100 gpu. The pipeline consist of 3D causal [CogVideoX](https://arxiv.org/pdf/2408.06072) VAE, text embedder [T5-V1.1-XXL](https://huggingface.co/google/t5-v1_1-xxl) and our trained MMDiT-like transformer model.