StevenZhang commited on
Commit
7805d95
·
1 Parent(s): f5f3d40

Update readme

Browse files
Files changed (1) hide show
  1. README.md +65 -40
README.md CHANGED
@@ -40,28 +40,29 @@ widgets:
40
  max_words: /
41
  task: image-to-video
42
  ---
 
43
 
44
- # Image-to-Video
45
 
46
- 本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型,其核心部分包含两个阶段,分别解决语义一致性和清晰度的问题,参数量共计约37亿,模型经过在大规模视频和图像数据混合预训练,并在少量精品数据上微调得到,该数据分布广泛、类别多样化,模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型,**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
47
 
48
- 此外,**MS-Image2Video**的许多设计理念继承于我们已经公开的工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
49
 
50
- The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity.
51
 
52
- Additionally, many of the design concepts for **MS-Image2Video** are inherited from our publicly available work, **VideoComposer**. For detailed information, please refer to our [VideoComposer](https://videocomposer.github.io) and the Github code repository for this project.
53
 
54
  <center>
55
  <p align="center">
56
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/image/Fig_twostage.png"/>
57
  <br/>
58
- Fig.1 MS-Image2Video
59
  <p>
60
  </center>
61
 
62
  ## 模型介绍 (Introduction)
63
 
64
- **MS-Image2Video**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频,我们将**MS-Image2Video**分为两个阶段,第一阶段保证语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点:
65
 
66
  - 高清&宽屏,可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景
67
  - 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制
@@ -70,7 +71,7 @@ Additionally, many of the design concepts for **MS-Image2Video** are inherited f
70
 
71
  以下为生成的部分案例:
72
 
73
- **MS-Image2Video** is built on Stable Diffusion, as shown in Fig.2, and uses a specially designed spatiotemporal UNet to perform spatiotemporal modeling in the latent space, and then reconstructs the final video through the decoder. In order to generate 720P videos, **MS-Image2Video** is divided into two stages. The first stage guarantees semantic consistency but with low resolution, while the second stage uses the DDIM inverse operation and applies denoising on a new VLDM to improve the resolution and spatiotemporal consistency of the video. Through joint optimization of the model, training, and data, this project has the following characteristics:
74
 
75
  - High-definition & widescreen, can directly generate 720P (1280*720) resolution videos, and compared to existing open source projects, not only is the resolution effectively improved, but the widescreen videos it produces can also be suitable for more scenarios.
76
  - No watermark, the model is trained on a large-scale watermark-free video/image dataset internally and fine-tuned on high-quality data, generating watermark-free videos that can be applied to more video platforms and reducing many restrictions.
@@ -82,7 +83,7 @@ Below are some examples generated by the model:
82
 
83
  <center>
84
  <p align="center">
85
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/image/fig1_overview.jpg"/>
86
  <br/>
87
  Fig.2 VLDM
88
  <p>
@@ -96,10 +97,10 @@ Below are some examples generated by the model:
96
  <table><center>
97
  <tr>
98
  <td ><center>
99
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/dragon2_rank_02-00-0021-001024.gif"/>
100
  </center></td>
101
  <td ><center>
102
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/laoshu_rank_02-01-0810-001024.gif"/>
103
  </center></td>
104
  </tr>
105
  <tr>
@@ -112,10 +113,10 @@ Below are some examples generated by the model:
112
  </tr>
113
  <tr>
114
  <td ><center>
115
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/ac10af0b1c524b778aff60be5b7ecc4f_2_02_00_0065_rank_02-00-1256-001024.gif"/>
116
  </center></td>
117
  <td ><center>
118
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/ast_rank_02-00-0773-001024.gif"/>
119
  </center></td>
120
  </tr>
121
  <tr>
@@ -128,10 +129,10 @@ Below are some examples generated by the model:
128
  </tr>
129
  <tr>
130
  <td ><center>
131
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/e3733444344741f1970cf2e92e617182_1_02_00_0199.gif"/>
132
  </center></td>
133
  <td ><center>
134
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/b307dad96c3d440e80514b1b3f3be5fd_1_rank_02-00-0068-000000.gif"/>
135
  </center></td>
136
  </tr>
137
  <tr>
@@ -144,10 +145,10 @@ Below are some examples generated by the model:
144
  </tr>
145
  <tr>
146
  <td ><center>
147
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/robot1_rank_02-01-0009-009999.gif"/>
148
  </center></td>
149
  <td ><center>
150
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/d82ed4ad01034243ba88eaf9311c1edf_3_02_01_0193.gif"/>
151
  </center></td>
152
  </tr>
153
  <tr>
@@ -160,10 +161,10 @@ Below are some examples generated by the model:
160
  </tr>
161
  <tr>
162
  <td ><center>
163
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/airship_0_rank_02-00-000000_rank_02-00-0653-001024.gif"/>
164
  </center></td>
165
  <td ><center>
166
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/airship_1_rank_02-01-000000_rank_02-00-1428-001024.gif"/>
167
  </center></td>
168
  </tr>
169
  <tr>
@@ -176,10 +177,10 @@ Below are some examples generated by the model:
176
  </tr>
177
  <tr>
178
  <td ><center>
179
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/0ba38f2f287f446dac8de87291073e0c_3_rank_02-01-0118-000000.gif"/>
180
  </center></td>
181
  <td ><center>
182
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/03b401c825a2479eaf7b1b3252683a4b_3_02_00_0110_rank_02-00-1009-001024.gif"/>
183
  </center></td>
184
  </tr>
185
  <tr>
@@ -192,10 +193,10 @@ Below are some examples generated by the model:
192
  </tr>
193
  <tr>
194
  <td ><center>
195
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/3e89356e6bd3470aaf3900b1b34c3ec2_0_rank_02-01-0126-000000.gif"/>
196
  </center></td>
197
  <td ><center>
198
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/6fd21439fce644afa3a2e9b057956d0f_0000000_rank_02-01-0159-001024.gif"/>
199
  </center></td>
200
  </tr>
201
  <tr>
@@ -208,10 +209,10 @@ Below are some examples generated by the model:
208
  </tr>
209
  <tr>
210
  <td ><center>
211
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/293fdf76aa404971b1fbb66baf9cbaac_1_02_00_0123_rank_02-00-0288-001024.gif"/>
212
  </center></td>
213
  <td ><center>
214
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/426a7bee22034a88872dc8277ddbbf06_0_02_01_0023_rank_02-01-1090-001024.gif"/>
215
  </center></td>
216
  </tr>
217
  <tr>
@@ -224,10 +225,10 @@ Below are some examples generated by the model:
224
  </tr>
225
  <tr>
226
  <td ><center>
227
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/a15bb09862b74b3c983a54b379912f81_0_02_00_0055_rank_02-01-0443-001024.gif"/>
228
  </center></td>
229
  <td ><center>
230
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/7716d91802614bf9a99174c05bd08f32_3_02_01_0157_rank_02-01-1199-001024.gif"/>
231
  </center></td>
232
  </tr>
233
  <tr>
@@ -240,10 +241,10 @@ Below are some examples generated by the model:
240
  </tr>
241
  <tr>
242
  <td ><center>
243
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/indian_rank_02-00-0800-001024.gif"/>
244
  </center></td>
245
  <td ><center>
246
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/bike_rank_02-01-0007-001024.gif"/>
247
  </center></td>
248
  </tr>
249
  <tr>
@@ -256,10 +257,10 @@ Below are some examples generated by the model:
256
  </tr>
257
  <tr>
258
  <td ><center>
259
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/panda_rank_02-01-0007-009999.gif"/>
260
  </center></td>
261
  <td ><center>
262
- <img src="https://huggingface.co/damo-vilab/MS-Image2Video/resolve/main/assets/gif/bf19a66dca0a47799923c47249982ffd_0000000_rank_02-01-0960-001024.gif"/>
263
  </center></td>
264
  </tr>
265
  <tr>
@@ -272,7 +273,8 @@ Below are some examples generated by the model:
272
  </tr>
273
  </table>
274
  </center>
275
-
 
276
 
277
  ### 依赖项 (Dependency)
278
 
@@ -286,14 +288,14 @@ sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
286
  ```
287
 
288
 
289
- 其次,本**MS-Image2Video**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项。
290
 
291
- The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
292
 
293
 
294
  ```bash
295
- pip install modelscope==1.4.2
296
- pip install -U xformers
297
  pip install torch==2.0.1
298
  pip install open_clip_torch>=2.0.2
299
  pip install opencv-python-headless
@@ -304,6 +306,7 @@ pip install fairscale
304
  pip install scipy
305
  pip install imageio
306
  pip install pytorch-lightning
 
307
  ```
308
 
309
 
@@ -319,18 +322,40 @@ For more experiments, please stay tuned for our upcoming technical report and op
319
  from modelscope.pipelines import pipeline
320
  from modelscope.outputs import OutputKeys
321
 
322
- pipe = pipeline("image-to-video", 'damo/Image-to-Video')
323
 
324
  # IMG_PATH: your image path (url or local file)
325
  output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
326
  print(output_video_path)
327
  ```
328
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
329
 
330
  ### 模型局限 (Limitation)
331
 
332
 
333
- 本**MS-Image2Video**项目的模型在处理以下情况会存在局限性:
334
  - 小目标生成能力有限,在生成较小目标的时候,会存在一定的错误
335
  - 快速运动目标生成能力有限,当生成快速运动目标时,会存在一定的假象
336
  - 生成速度较慢,生成高清视频会明显导致生成速度减慢
@@ -338,7 +363,7 @@ print(output_video_path)
338
  此外,我们研究也发现,生成的视频空间上的质量和时序上的变化速度在一定程度上存在互斥现象,在本项目我们选择了其折中的模型,兼顾两则的平衡。
339
 
340
 
341
- The model of the **MS-Image2Video** project has limitations in the following scenarios:
342
  - Limited ability to generate small objects: There may be some errors when generating smaller objects.
343
  - Limited ability to generate fast-moving objects: There may be some artifacts when generating fast-moving objects.
344
  - Slow generation speed: Generating high-definition videos significantly slows down the generation speed.
 
40
  max_words: /
41
  task: image-to-video
42
  ---
43
+ ## 模型介绍 (Introduction)
44
 
45
+ # I2VGen-XL高清图像生成视频大模型
46
 
47
+ 本项目**I2VGen-XL**旨在解决根据输入图像生成高清视频任务。**I2VGen-XL**由达摩院研发的高清视频生成基础模型,其核心部分包含两个阶段,分别解决语义一致性和清晰度的问题,参数量共计约37亿,模型经过在大规模视频和图像数据混合预训练,并在少量精品数据上微调得到,该数据分布广泛、类别多样化,模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型,**I2VGen-XL**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
48
 
49
+ 此外,**I2VGen-XL**的许多设计理念继承于我们已经公开的工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
50
 
51
+ The **I2VGen-XL** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **I2VGen-XL** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **I2VGen-XL** has significant advantages in terms of clarity, texture, semantics, and temporal continuity.
52
 
53
+ Additionally, many of the design concepts for **I2VGen-XL** are inherited from our publicly available work, **VideoComposer**. For detailed information, please refer to our [VideoComposer](https://videocomposer.github.io) and the Github code repository for this project.
54
 
55
  <center>
56
  <p align="center">
57
+ <img src="assets/image/Fig_twostage.png" style="max-width: none;"/>
58
  <br/>
59
+ Fig.1 I2VGen-XL
60
  <p>
61
  </center>
62
 
63
  ## 模型介绍 (Introduction)
64
 
65
+ **I2VGen-XL**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频,我们将**I2VGen-XL**分为两个阶段,第一阶段保���语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点:
66
 
67
  - 高清&宽屏,可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景
68
  - 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制
 
71
 
72
  以下为生成的部分案例:
73
 
74
+ **I2VGen-XL** is built on Stable Diffusion, as shown in Fig.2, and uses a specially designed spatiotemporal UNet to perform spatiotemporal modeling in the latent space, and then reconstructs the final video through the decoder. In order to generate 720P videos, **I2VGen-XL** is divided into two stages. The first stage guarantees semantic consistency but with low resolution, while the second stage uses the DDIM inverse operation and applies denoising on a new VLDM to improve the resolution and spatiotemporal consistency of the video. Through joint optimization of the model, training, and data, this project has the following characteristics:
75
 
76
  - High-definition & widescreen, can directly generate 720P (1280*720) resolution videos, and compared to existing open source projects, not only is the resolution effectively improved, but the widescreen videos it produces can also be suitable for more scenarios.
77
  - No watermark, the model is trained on a large-scale watermark-free video/image dataset internally and fine-tuned on high-quality data, generating watermark-free videos that can be applied to more video platforms and reducing many restrictions.
 
83
 
84
  <center>
85
  <p align="center">
86
+ <img src="assets/image/fig1_overview.jpg" style="max-width: none;"/>
87
  <br/>
88
  Fig.2 VLDM
89
  <p>
 
97
  <table><center>
98
  <tr>
99
  <td ><center>
100
+ <img src="assets/gif/dragon2_rank_02-00-0021-001024.gif"/>
101
  </center></td>
102
  <td ><center>
103
+ <img src="assets/gif/laoshu_rank_02-01-0810-001024.gif"/>
104
  </center></td>
105
  </tr>
106
  <tr>
 
113
  </tr>
114
  <tr>
115
  <td ><center>
116
+ <img src="assets/gif/ac10af0b1c524b778aff60be5b7ecc4f_2_02_00_0065_rank_02-00-1256-001024.gif"/>
117
  </center></td>
118
  <td ><center>
119
+ <img src="assets/gif/ast_rank_02-00-0773-001024.gif"/>
120
  </center></td>
121
  </tr>
122
  <tr>
 
129
  </tr>
130
  <tr>
131
  <td ><center>
132
+ <img src="assets/gif/e3733444344741f1970cf2e92e617182_1_02_00_0199.gif"/>
133
  </center></td>
134
  <td ><center>
135
+ <img src="assets/gif/b307dad96c3d440e80514b1b3f3be5fd_1_rank_02-00-0068-000000.gif"/>
136
  </center></td>
137
  </tr>
138
  <tr>
 
145
  </tr>
146
  <tr>
147
  <td ><center>
148
+ <img src="assets/gif/robot1_rank_02-01-0009-009999.gif"/>
149
  </center></td>
150
  <td ><center>
151
+ <img src="assets/gif/d82ed4ad01034243ba88eaf9311c1edf_3_02_01_0193.gif"/>
152
  </center></td>
153
  </tr>
154
  <tr>
 
161
  </tr>
162
  <tr>
163
  <td ><center>
164
+ <img src="assets/gif/airship_0_rank_02-00-000000_rank_02-00-0653-001024.gif"/>
165
  </center></td>
166
  <td ><center>
167
+ <img src="assets/gif/airship_1_rank_02-01-000000_rank_02-00-1428-001024.gif"/>
168
  </center></td>
169
  </tr>
170
  <tr>
 
177
  </tr>
178
  <tr>
179
  <td ><center>
180
+ <img src="assets/gif/0ba38f2f287f446dac8de87291073e0c_3_rank_02-01-0118-000000.gif"/>
181
  </center></td>
182
  <td ><center>
183
+ <img src="assets/gif/03b401c825a2479eaf7b1b3252683a4b_3_02_00_0110_rank_02-00-1009-001024.gif"/>
184
  </center></td>
185
  </tr>
186
  <tr>
 
193
  </tr>
194
  <tr>
195
  <td ><center>
196
+ <img src="assets/gif/3e89356e6bd3470aaf3900b1b34c3ec2_0_rank_02-01-0126-000000.gif"/>
197
  </center></td>
198
  <td ><center>
199
+ <img src="assets/gif/6fd21439fce644afa3a2e9b057956d0f_0000000_rank_02-01-0159-001024.gif"/>
200
  </center></td>
201
  </tr>
202
  <tr>
 
209
  </tr>
210
  <tr>
211
  <td ><center>
212
+ <img src="assets/gif/293fdf76aa404971b1fbb66baf9cbaac_1_02_00_0123_rank_02-00-0288-001024.gif"/>
213
  </center></td>
214
  <td ><center>
215
+ <img src="assets/gif/426a7bee22034a88872dc8277ddbbf06_0_02_01_0023_rank_02-01-1090-001024.gif"/>
216
  </center></td>
217
  </tr>
218
  <tr>
 
225
  </tr>
226
  <tr>
227
  <td ><center>
228
+ <img src="assets/gif/a15bb09862b74b3c983a54b379912f81_0_02_00_0055_rank_02-01-0443-001024.gif"/>
229
  </center></td>
230
  <td ><center>
231
+ <img src="assets/gif/7716d91802614bf9a99174c05bd08f32_3_02_01_0157_rank_02-01-1199-001024.gif"/>
232
  </center></td>
233
  </tr>
234
  <tr>
 
241
  </tr>
242
  <tr>
243
  <td ><center>
244
+ <img src="assets/gif/indian_rank_02-00-0800-001024.gif"/>
245
  </center></td>
246
  <td ><center>
247
+ <img src="assets/gif/bike_rank_02-01-0007-001024.gif"/>
248
  </center></td>
249
  </tr>
250
  <tr>
 
257
  </tr>
258
  <tr>
259
  <td ><center>
260
+ <img src="assets/gif/panda_rank_02-01-0007-009999.gif"/>
261
  </center></td>
262
  <td ><center>
263
+ <img src="assets/gif/bf19a66dca0a47799923c47249982ffd_0000000_rank_02-01-0960-001024.gif"/>
264
  </center></td>
265
  </tr>
266
  <tr>
 
273
  </tr>
274
  </table>
275
  </center>
276
+
277
+ > [<font color="#dd0000">2023.08.25 更新</font>] ModelScope发布1.8.4版本,I2VGen-XL模型更新到模型参数文件 v1.1.0;
278
 
279
  ### 依赖项 (Dependency)
280
 
 
288
  ```
289
 
290
 
291
+ 其次,本**I2VGen-XL**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项。
292
 
293
+ The **I2VGen-XL** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
294
 
295
 
296
  ```bash
297
+ pip install modelscope==1.8.4
298
+ pip install xformers==0.0.20
299
  pip install torch==2.0.1
300
  pip install open_clip_torch>=2.0.2
301
  pip install opencv-python-headless
 
306
  pip install scipy
307
  pip install imageio
308
  pip install pytorch-lightning
309
+ pip install torchsde
310
  ```
311
 
312
 
 
322
  from modelscope.pipelines import pipeline
323
  from modelscope.outputs import OutputKeys
324
 
325
+ pipe = pipeline(task='image-to-video', model='damo/Image-to-Video', model_revision='v1.1.0')
326
 
327
  # IMG_PATH: your image path (url or local file)
328
  output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
329
  print(output_video_path)
330
  ```
331
 
332
+ 如果想生成超分视频的话, 示例见下:
333
+
334
+ If you want to generate high-resolution video, please use the following code:
335
+
336
+ ```python
337
+ from modelscope.pipelines import pipeline
338
+ from modelscope.outputs import OutputKeys
339
+
340
+ # if you only have one GPU, please make it's GPU memory bigger than 50G, or you can use two GPUs, and set them by device
341
+ pipe1 = pipeline(task='image-to-video', model='damo/Image-to-Video', model_revision='v1.1.0', device='cuda:0')
342
+ pipe2 = pipeline(task='video-to-video', model='damo/Video-to-Video', model_revision='v1.1.0', device='cuda:0')
343
+
344
+ # image to video
345
+ output_video_path = pipe1("test.jpg", output_video='./i2v_output.mp4')[OutputKeys.OUTPUT_VIDEO]
346
+
347
+ # video resolution
348
+ p_input = {'video_path': output_video_path}
349
+ new_output_video_path = pipe2(p_input, output_video='./v2v_output.mp4')[OutputKeys.OUTPUT_VIDEO]
350
+ ```
351
+ 更多超分细节, 请访问 <a href="https://modelscope.cn/models/damo/Video-to-Video/summary">Video-to-Video</a>。 我们也提供了用户接口,请移步<a href="https://modelscope.cn/studios/damo/I2VGen-XL-Demo/summary">I2VGen-XL-Demo</a>。
352
+
353
+ Please visit <a href="https://modelscope.cn/models/damo/Video-to-Video/summary">Video-to-Video</a> for more details. We also provide user interface:<a href="https://modelscope.cn/studios/damo/I2VGen-XL-Demo/summary">I2VGen-XL-Demo</a>.
354
 
355
  ### 模型局限 (Limitation)
356
 
357
 
358
+ 本**I2VGen-XL**项目的模型在处理以下情况会存在局限性:
359
  - 小目标生成能力有限,在生成较小目标的时候,会存在一定的错误
360
  - 快速运动目标生成能力有限,当生成快速运动目标时,会存在一定的假象
361
  - 生成速度较慢,生成高清视频会明显导致生成速度减慢
 
363
  此外,我们研究也发现,生成的视频空间上的质量和时序上的变化速度在一定程度上存在互斥现象,在本项目我们选择了其折中的模型,兼顾两则的平衡。
364
 
365
 
366
+ The model of the **I2VGen-XL** project has limitations in the following scenarios:
367
  - Limited ability to generate small objects: There may be some errors when generating smaller objects.
368
  - Limited ability to generate fast-moving objects: There may be some artifacts when generating fast-moving objects.
369
  - Slow generation speed: Generating high-definition videos significantly slows down the generation speed.