THUDM
/

CogVideoX-2b

@@ -88,18 +88,18 @@ inference: false
 CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
 The table below provides a list of the video generation models we currently offer, along with their basic information.
-| Model Name                                 | CogVideoX-2B (Current Repos)                  |
-|--------------------------------------------|-----------------------------------------------|
-| Supported Prompt Language                  | English                                       |
-| GPU Memory Required for Inference          | 36GB (will be optimized before the PR is merged) |
-| GPU Memory Required for Fine-tuning (bs=1) | 42GB                                          |
-| Prompt Length                              | 226 Tokens                                    |
-| Video Length                               | 6 seconds                                     |
-| Frames Per Second                          | 8 frames                                      |
-| Resolution                                 | 720 * 480                                     |
-| Positional Embeddings                      | Sinusoidal                                    |
-| Quantized Inference                        | Not Supported                                 |
-| Multi-card Inference                       | Not Supported                                 |
 **Note** Using [SAT](https://github.com/THUDM/SwissArmyTransformer) model cost 18GB for inference. Check our github.
@@ -113,8 +113,7 @@ optimizations and conversions to get a better experience.**
 1. Install the required dependencies
 ```shell
-pip install --upgrade opencv-python transformers
-pip install git+https://github.com/huggingface/diffusers.git@878f609aa5ce4a78fea0f048726889debde1d7e8#egg=diffusers # Still in PR
 ```
 2. Run the code

 CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
 The table below provides a list of the video generation models we currently offer, along with their basic information.
+| Model Name                                 | CogVideoX-2B (Current Repos) |
+|--------------------------------------------|------------------------------|
+| Supported Prompt Language                  | English                      |
+| GPU Memory Required for Inference          | 36GB                         |
+| GPU Memory Required for Fine-tuning (bs=1) | 42GB                         |
+| Prompt Length                              | 226 Tokens                   |
+| Video Length                               | 6 seconds                    |
+| Frames Per Second                          | 8 frames                     |
+| Resolution                                 | 720 * 480                    |
+| Positional Embeddings                      | Sinusoidal                   |
+| Quantized Inference                        | Not Supported                |
+| Multi-card Inference                       | Not Supported                |
 **Note** Using [SAT](https://github.com/THUDM/SwissArmyTransformer) model cost 18GB for inference. Check our github.
 1. Install the required dependencies
 ```shell
+pip install --upgrade opencv-python transformers diffusers # Must using diffusers>=0.30.0
 ```
 2. Run the code

README_zh.md CHANGED Viewed

@@ -76,7 +76,7 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
 | Model Name    | CogVideoX-2B (当前仓库) |
 |---------------|---------------------|
 | 提示词语言         | English             |
-| 推理显存消耗        | 36GB（会在PR合并之前优化)    |
 | 微调显存消耗 (bs=1) | 42GB                |
 | 提示词长度上限       | 226 Tokens          |
 | 视频生成长度        | 6 seconds           |
@@ -97,8 +97,7 @@ CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生
 1. 安装对应的依赖
 ```shell
-pip install --upgrade opencv-python transformers acc
-pip install git+https://github.com/huggingface/diffusers.git@878f609aa5ce4a78fea0f048726889debde1d7e8#egg=diffusers # Still in PR
 ```
 2. 运行代码

 | Model Name    | CogVideoX-2B (当前仓库) |
 |---------------|---------------------|
 | 提示词语言         | English             |
+| 推理显存消耗        | 36GB                |
 | 微调显存消耗 (bs=1) | 42GB                |
 | 提示词长度上限       | 226 Tokens          |
 | 视频生成长度        | 6 seconds           |
 1. 安装对应的依赖
 ```shell
+pip install --upgrade opencv-python transformers accelerate diffusers # Must using diffusers>=0.30.0
 ```
 2. 运行代码