Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 19 days ago • 33
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 9
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 9
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 9 • 2
Centroid-centered Modeling for Efficient Vision Transformer Pre-training Paper • 2303.04664 • Published Mar 8, 2023
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Paper • 2402.06119 • Published Feb 9, 2024 • 1
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10