Concat-ID: Towards Universal Identity-Preserving Video Synthesis
Abstract
We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs Variational Autoencoders to extract image features, which are concatenated with video latents along the sequence dimension, leveraging solely 3D self-attention mechanisms without the need for additional modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial editability while enhancing video naturalness. Extensive experiments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenarios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a versatile and scalable solution for a wide range of applications.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion (2025)
- DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability (2025)
- CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance (2025)
- CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers (2025)
- Phantom: Subject-consistent video generation via cross-modal alignment (2025)
- Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter (2025)
- SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper