InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 3 days ago • 209
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published 3 days ago • 9
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting Paper • 2504.11092 • Published 2 days ago • 4
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published 3 days ago • 8
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper • 2504.07083 • Published 8 days ago • 21
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 9 days ago • 142
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 14 days ago • 35
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published 15 days ago • 38
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 19 days ago • 121
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 16 days ago • 28
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training Paper • 2503.24391 • Published 17 days ago • 7
Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Paper • 2503.22622 • Published 20 days ago • 18
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Paper • 2503.22194 • Published 20 days ago • 24
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better Paper • 2503.19904 • Published 23 days ago • 2
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Paper • 2503.19385 • Published 24 days ago • 32
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published 24 days ago • 71