MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 13 days ago • 78
MixerMDM: Learnable Composition of Human Motion Diffusion Models Paper • 2504.01019 • Published 13 days ago • 18
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 13 days ago • 28
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 14 days ago • 74
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published 20 days ago • 35
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 16 days ago • 120
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos Paper • 2503.17973 • Published 23 days ago • 7
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published 21 days ago • 71
Concat-ID: Towards Universal Identity-Preserving Video Synthesis Paper • 2503.14151 • Published 28 days ago • 10
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published Mar 13 • 22
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 27 days ago • 117
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published 28 days ago • 28
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published Mar 8 • 136
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 30 days ago • 63