|
--- |
|
license: mit |
|
--- |
|
|
|
## Latte: Latent Diffusion Transformer for Video Generation |
|
|
|
This repo contains pre-trained weights on FaceForensics, SkyTimelapse, UCF101, and Taichi-HD for our paper exploring latent diffusion models with transformers (Latte). You can find more visualizations on our [project page](https://maxin-cn.github.io/latte_project/). |
|
If you want to obtain text-to-video generation pre-trained weights, please refer to [here](https://huggingface.co/maxin-cn/LatteT2V). |
|
|
|
## News |
|
- (π₯ New) May. 23, 2024. π₯ **Latte-1** for Text-to-video generation is released! You can download pre-trained model [here](https://huggingface.co/maxin-cn/LatteT2V/tree/main/transformer_v1). Latte-1 also supports Text-to-image generation, please run bash sample/t2i.sh. |
|
|
|
- (π₯ New) Mar. 20, 2024. π₯ An updated LatteT2V model is coming soon, stay tuned! |
|
|
|
- (π₯ New) Feb. 24, 2024. π₯ We are very grateful that researchers and developers like our work. We will continue to update our LatteT2V model, hoping that our efforts can help the community develop. Our Latte [discord](https://discord.gg/RguYqhVU92) channel is created for discussions. Coders are welcome to contribute. |
|
|
|
- (π₯ New) Jan. 9, 2024. π₯ An updated LatteT2V model initialized with the [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha) is released, the checkpoint can be found [here](https://huggingface.co/maxin-cn/LatteT2V/tree/main/transformer). |
|
|
|
- (π₯ New) Oct. 31, 2023. π₯ The training and inference code is released. All checkpoints (including FaceForensics, SkyTimelapse, UCF101, and Taichi-HD) can be found [here](https://huggingface.co/maxin-cn/Latte/tree/main). In addition, the LatteT2V inference code is provided. |
|
|
|
## Contact Us |
|
**Yaohui Wang**: [wangyaohui@pjlab.org.cn](mailto:wangyaohui@pjlab.org.cn) |
|
**Xin Ma**: [xin.ma1@monash.edu](mailto:xin.ma1@monash.edu) |
|
|
|
## Citation |
|
If you find this work useful for your research, please consider citing it. |
|
```bibtex |
|
@article{ma2024latte, |
|
title={Latte: Latent Diffusion Transformer for Video Generation}, |
|
author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Liu, Ziwei and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu}, |
|
journal={arXiv preprint arXiv:2401.03048}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|
|
## Acknowledgments |
|
Latte has been greatly inspired by the following amazing works and teams: [DiT](https://github.com/facebookresearch/DiT) and [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha), we thank all the contributors for open-sourcing. |