--- license: apache-2.0 base_model: - THUDM/CogVideoX-5b-I2V pipeline_tag: image-to-video --- # SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

Skyreels Logo

๐ŸŒ Github ยท ๐Ÿ‘‹ Playground

This repo contains Diffusers style model weights for Skyreels A1 models. You can find the inference code on [SkyReels-A1](https://github.com/SkyworkAI/SkyReels-A1) repository. --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62e34a12c9bece303d146af8/Ysbe66shplYZw2fjkFUHL.png) Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features. --- Some generated results: ## Citation If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX: ```bibtex @article{qiu2025skyreels, title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers}, author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang}, journal={arXiv preprint arXiv:2502.10841}, year={2025} } ```