## Environment Setup

`pip install -r requirements.txt`

## Download checkpoints

1. Download the pretrained checkpoints of [SVD_xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) from huggingface to `./ckpts`.

2. Download the checkpint of [MOFA-Adapter](https://huggingface.co/MyNiuuu/MOFA-Video-Traj) from huggingface to `./ckpts`.

The final structure of checkpoints should be:


```text
./ckpts/
|-- controlnet
|   |-- config.json
|   `-- diffusion_pytorch_model.safetensors
|-- stable-video-diffusion-img2vid-xt-1-1
|   |-- feature_extractor
|       |-- ...
|   |-- image_encoder
|       |-- ...
|   |-- scheduler
|       |-- ...
|   |-- unet
|       |-- ...
|   |-- unet_ch9
|       |-- ...
|   |-- vae
|       |-- ...
|   |-- svd_xt_1_1.safetensors
|   `-- model_index.json
```

## Run Gradio Demo

`python run_gradio.py`

Please refer to the instructions on the gradio interface during the inference process.

## Paper

arxiv.org/abs/2405.20222