🦄️ MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu ^1,2 Xiaodong Cun^2,* Xintao Wang² Yong Zhang²
Ying Shan² Yinqiang Zheng^1,*

¹ The University of Tokyo ² Tencent AI Lab ^* Corresponding Author

---

Check the gallery of our project page for many visual results!

## 📰 CODE RELEASE - [ ] Gradio demo and checkpoints for trajectory-based image animation (By this weekend) - [ ] Inference scripts and checkpoints for keypoint-based facial image animation - [ ] inference Gradio demo for hybrid image animation - [ ] Training codes ## Introduction

We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD. ## Acknowledgements Our Gradio codes are based on the early release of [DragNUWA](https://arxiv.org/abs/2308.08089). Our training codes are based on [Diffusers](https://github.com/huggingface/diffusers) and [SVD_Xtend](https://github.com/pixeli99/SVD_Xtend). We appreciate the code release of these projects.