Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Paper • 2503.19881 • Published 29 days ago • 6
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published Feb 16 • 60