zR commited on
Commit
4bbfb1d
1 Parent(s): 0d6a353

update GPU memory to 24GB

Browse files
Files changed (2) hide show
  1. README.md +14 -13
  2. README_zh.md +14 -13
README.md CHANGED
@@ -88,18 +88,17 @@ inference: false
88
  CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
89
  The table below provides a list of the video generation models we currently offer, along with their basic information.
90
 
91
- | Model Name | CogVideoX-2B (Current Repos) |
92
- |--------------------------------------------|------------------------------|
93
- | Supported Prompt Language | English |
94
- | GPU Memory Required for Inference | 36GB |
95
- | GPU Memory Required for Fine-tuning (bs=1) | 42GB |
96
- | Prompt Length | 226 Tokens |
97
- | Video Length | 6 seconds |
98
- | Frames Per Second | 8 frames |
99
- | Resolution | 720 * 480 |
100
- | Positional Embeddings | Sinusoidal |
101
- | Quantized Inference | Not Supported |
102
- | Multi-card Inference | Not Supported |
103
 
104
  **Note** Using [SAT](https://github.com/THUDM/SwissArmyTransformer) model cost 18GB for inference. Check our github.
105
 
@@ -128,7 +127,9 @@ prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wood
128
  pipe = CogVideoXPipeline.from_pretrained(
129
  "THUDM/CogVideoX-2b",
130
  torch_dtype=torch.float16
131
- ).to("cuda")
 
 
132
 
133
  prompt_embeds, _ = pipe.encode_prompt(
134
  prompt=prompt,
 
88
  CogVideoX is an open-source video generation model that shares the same origins as [清影](https://chatglm.cn/video).
89
  The table below provides a list of the video generation models we currently offer, along with their basic information.
90
 
91
+ | Model Name | CogVideoX-2B |
92
+ |-------------------------------------------|--------------------------------------|
93
+ | Prompt Language | English |
94
+ | Single GPU Inference (FP16) | 23.9GB |
95
+ | Multi GPUs Inference (FP16) | 20GB minimum per GPU using diffusers |
96
+ | GPU Memory Required for Fine-tuning(bs=1) | 40GB |
97
+ | Prompt Max Length | 226 Tokens |
98
+ | Video Length | 6 seconds |
99
+ | Frames Per Second | 8 frames |
100
+ | Resolution | 720 * 480 |
101
+ | Quantized Inference | Not Supported |
 
102
 
103
  **Note** Using [SAT](https://github.com/THUDM/SwissArmyTransformer) model cost 18GB for inference. Check our github.
104
 
 
127
  pipe = CogVideoXPipeline.from_pretrained(
128
  "THUDM/CogVideoX-2b",
129
  torch_dtype=torch.float16
130
+ )
131
+
132
+ pipe.enable_model_cpu_offload()
133
 
134
  prompt_embeds, _ = pipe.encode_prompt(
135
  prompt=prompt,
README_zh.md CHANGED
@@ -73,18 +73,17 @@
73
 
74
  CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生成模型。下表战展示目前我们提供的视频生成模型列表,以及相关基础信息。
75
 
76
- | Model Name | CogVideoX-2B (当前仓库) |
77
- |---------------|---------------------|
78
- | 提示词语言 | English |
79
- | 推理显存消耗 | 36GB |
80
- | 微调显存消耗 (bs=1) | 42GB |
81
- | 提示词长度上限 | 226 Tokens |
82
- | 视频生成长度 | 6 seconds |
83
- | 视频生成帧率 (每秒) | 8 frames |
84
- | 视频生成分辨率 | 720 * 480 |
85
- | 位置编码 | Sinusoidal |
86
- | 量化 | 不支持 |
87
- | 多卡推理 | 不支持 |
88
 
89
  **Note** 使用 [SAT](https://github.com/THUDM/SwissArmyTransformer) 推理SAT版本模型仅需18G显存。欢迎前往我们的github查看。
90
 
@@ -112,7 +111,9 @@ prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wood
112
  pipe = CogVideoXPipeline.from_pretrained(
113
  "THUDM/CogVideoX-2b",
114
  torch_dtype=torch.float16
115
- ).to("cuda")
 
 
116
 
117
  prompt_embeds, _ = pipe.encode_prompt(
118
  prompt=prompt,
 
73
 
74
  CogVideoX是 [清影](https://chatglm.cn/video) 同源的开源版本视频生成模型。下表战展示目前我们提供的视频生成模型列表,以及相关基础信息。
75
 
76
+ | 模型名 | CogVideoX-2B |
77
+ |---------------------|--------------------------------------|
78
+ | 提示词语言 | English |
79
+ | 单GPU推理 (FP-16) 显存消耗 | 23.9GB |
80
+ | 多GPU推理 (FP-16) 显存消耗 | 20GB minimum per GPU using diffusers |
81
+ | 微调显存消耗 (bs=1) | 42GB |
82
+ | 提示词长度上限 | 226 Tokens |
83
+ | 视频长度 | 6 seconds |
84
+ | 帧率(每秒) | 8 frames |
85
+ | 视频分辨率 | 720 * 480 |
86
+ | 量化推理 | 不支持 |
 
87
 
88
  **Note** 使用 [SAT](https://github.com/THUDM/SwissArmyTransformer) 推理SAT版本模型仅需18G显存。欢迎前往我们的github查看。
89
 
 
111
  pipe = CogVideoXPipeline.from_pretrained(
112
  "THUDM/CogVideoX-2b",
113
  torch_dtype=torch.float16
114
+ )
115
+
116
+ pipe.enable_model_cpu_offload()
117
 
118
  prompt_embeds, _ = pipe.encode_prompt(
119
  prompt=prompt,