Papers
arxiv:2503.14151

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Published on Mar 18, 2025
· Submitted by
YSH
on Mar 19, 2025
Authors:
,
,

Abstract

Concat-ID uses Variational Autoencoders and 3D self-attention to generate identity-preserving videos, offering superior scalability and naturalness in both single and multi-identity scenarios.

AI-generated summary

We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs Variational Autoencoders to extract image features, which are concatenated with video latents along the sequence dimension, leveraging solely 3D self-attention mechanisms without the need for additional modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial editability while enhancing video naturalness. Extensive experiments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenarios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a versatile and scalable solution for a wide range of applications.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2503.14151
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.14151 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.14151 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.14151 in a Space README.md to link it from this page.

Collections including this paper 2