Papers
arxiv:2405.18991

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Published on May 29
· Submitted by akhaliq on May 30
Authors:
,
,
,
,

Abstract

This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the production of consistent frames and seamless motion transitions. The motion module can be adapted to various DiT baseline methods to generate video with different styles. It can also generate videos with different frame rates and resolutions during both training and inference phases, suitable for both images and videos. Moreover, we introduce slice VAE, a novel approach to condense the temporal axis, facilitating the generation of long duration videos. Currently, EasyAnimate exhibits the proficiency to generate videos with 144 frames. We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training (both the baseline model and LoRA model), and end-to-end video inference. Code is available at: https://github.com/aigc-apps/EasyAnimate. We are continuously working to enhance the performance of our method.

Community

Paper author
edited May 31

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.18991 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 6