dayoucdy commited on
Commit
1e7b71d
1 Parent(s): d6b1e69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -5,16 +5,16 @@ tags:
5
  duplicated_from: diffusers/text-to-video-ms-1.7b
6
  ---
7
 
 
 
 
 
8
  **We Are Hiring!** (Based in Beijing / Hangzhou, China.)
9
 
10
  If you're looking for an exciting challenge and the opportunity to work with cutting-edge technologies in AIGC and large-scale pretraining, then we are the place for you. We are looking for talented, motivated and creative individuals to join our team. If you are interested, please send your CV to us.
11
 
12
  EMAIL: yingya.zyy@alibaba-inc.com
13
 
14
- # Text-to-video-synthesis Model in Open Domain
15
-
16
- This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
17
-
18
  ## Model description
19
 
20
  The text-to-video generation diffusion model consists of three sub-networks: text feature extraction model, text feature-to-video latent space diffusion model, and video latent space to video visual space model. The overall model parameters are about 1.7 billion. Currently, it only supports English input. The diffusion model adopts a UNet3D structure, and implements video generation through the iterative denoising process from the pure Gaussian noise video.
 
5
  duplicated_from: diffusers/text-to-video-ms-1.7b
6
  ---
7
 
8
+ # Text-to-video-synthesis Model in Open Domain
9
+
10
+ This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
11
+
12
  **We Are Hiring!** (Based in Beijing / Hangzhou, China.)
13
 
14
  If you're looking for an exciting challenge and the opportunity to work with cutting-edge technologies in AIGC and large-scale pretraining, then we are the place for you. We are looking for talented, motivated and creative individuals to join our team. If you are interested, please send your CV to us.
15
 
16
  EMAIL: yingya.zyy@alibaba-inc.com
17
 
 
 
 
 
18
  ## Model description
19
 
20
  The text-to-video generation diffusion model consists of three sub-networks: text feature extraction model, text feature-to-video latent space diffusion model, and video latent space to video visual space model. The overall model parameters are about 1.7 billion. Currently, it only supports English input. The diffusion model adopts a UNet3D structure, and implements video generation through the iterative denoising process from the pure Gaussian noise video.