Papers
arxiv:2401.09084

UniVG: Towards UNIfied-modal Video Generation

Published on Jan 17
· Submitted by akhaliq on Jan 18
Authors:
,
,
,

Abstract

Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either individually or in combination. To address this, we propose a Unified-modal Video Genearation system that is capable of handling multiple video generation tasks across text and image modalities. To this end, we revisit the various video generation tasks within our system from the perspective of generative freedom, and classify them into high-freedom and low-freedom video generation categories. For high-freedom video generation, we employ Multi-condition Cross Attention to generate videos that align with the semantics of the input images or text. For low-freedom video generation, we introduce Biased Gaussian Noise to replace the pure random Gaussian Noise, which helps to better preserve the content of the input conditions. Our method achieves the lowest Fr\'echet Video Distance (FVD) on the public academic benchmark MSR-VTT, surpasses the current open-source methods in human evaluations, and is on par with the current close-source method Gen2. For more samples, visit https://univg-baidu.github.io.

Community

38169982-d9ba-4c5d-b483-5e5c02890827.webp

This comment has been hidden
This comment has been hidden

car

生成一只哈士奇在海边散步的视频

No matter how beautiful the scenery is, if you have passed by, you must leave.

How to evaluate this video?

你好

No matter how beautiful the scenery is, if you have passed by, you must leave.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Frame 7: Encounter with colleague Ricky

Description: The rescue boat arrives, and the protagonist and colleague Ricky hug and exchange greetings, feeling overwhelmed.

This comment has been hidden

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.09084 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.09084 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.09084 in a Space README.md to link it from this page.

Collections including this paper 6