anchor commited on
Commit
f725526
1 Parent(s): a893898

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -23,7 +23,7 @@ Kwok-Wai Hung,
23
  Chao Zhan,
24
  Yingjie He,
25
  Wenjiang Zhou
26
- (<sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Author)
27
  </br>
28
  Lyra Lab, Tencent Music Entertainment
29
 
@@ -352,14 +352,16 @@ please refer to [MuseV](https://github.com/TMElyralab/MuseV)
352
 
353
  # Acknowledgements
354
 
355
- 1. MuseV has referred much to [TuneAVideo](https://github.com/showlab/Tune-A-Video), [diffusers](https://github.com/huggingface/diffusers), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/pipelines), [animatediff](https://github.com/guoyww/AnimateDiff), [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), [AnimateAnyone](https://arxiv.org/abs/2311.17117), [VideoFusion](https://arxiv.org/abs/2303.08320).
356
  2. MuseV has been built on `ucf101` and `webvid` datasets.
357
 
358
  Thanks for open-sourcing!
359
 
 
360
  # Limitation
361
  There are still many limitations, including
362
 
 
363
  1. Limited types of video generation and limited motion range, partly because of limited types of training data. The released `MuseV` has been trained on approximately 60K human text-video pairs with resolution `512*320`. `MuseV` has greater motion range while lower video quality at lower resolution. `MuseV` tends to generate less motion range with high video quality. Trained on larger, higher resolution, higher quality text-video dataset may make `MuseV` better.
364
  1. Watermarks may appear because of `webvid`. A cleaner dataset withour watermarks may solve this issue.
365
  1. Limited types of long video generation. Visual Conditioned Parallel Denoise can solve accumulated error of video generation, but the current method is only suitable for relatively fixed camera scenes.
 
23
  Chao Zhan,
24
  Yingjie He,
25
  Wenjiang Zhou
26
+ (<sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Author, benbinwu@tencent.com)
27
  </br>
28
  Lyra Lab, Tencent Music Entertainment
29
 
 
352
 
353
  # Acknowledgements
354
 
355
+ 1. MuseV has referred much to [TuneAVideo](https://github.com/showlab/Tune-A-Video), [diffusers](https://github.com/huggingface/diffusers), [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone/tree/master/src/pipelines), [animatediff](https://github.com/guoyww/AnimateDiff), [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), [AnimateAnyone](https://arxiv.org/abs/2311.17117), [VideoFusion](https://arxiv.org/abs/2303.08320), [insightface](https://github.com/deepinsight/insightface).
356
  2. MuseV has been built on `ucf101` and `webvid` datasets.
357
 
358
  Thanks for open-sourcing!
359
 
360
+
361
  # Limitation
362
  There are still many limitations, including
363
 
364
+ 1. Lack of generalization ability. Some visual condition image perform well, some perform bad. Some t2i pretraied model perform well, some perform bad.
365
  1. Limited types of video generation and limited motion range, partly because of limited types of training data. The released `MuseV` has been trained on approximately 60K human text-video pairs with resolution `512*320`. `MuseV` has greater motion range while lower video quality at lower resolution. `MuseV` tends to generate less motion range with high video quality. Trained on larger, higher resolution, higher quality text-video dataset may make `MuseV` better.
366
  1. Watermarks may appear because of `webvid`. A cleaner dataset withour watermarks may solve this issue.
367
  1. Limited types of long video generation. Visual Conditioned Parallel Denoise can solve accumulated error of video generation, but the current method is only suitable for relatively fixed camera scenes.