Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 5 days ago • 14
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published 5 days ago • 106
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published 9 days ago • 77
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 8 days ago • 92
"Principal Components" Enable A New Language of Images Paper • 2503.08685 • Published 8 days ago • 11
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Paper • 2502.17258 • Published 23 days ago • 73
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 17 days ago • 54
GHOST 2.0: generative high-fidelity one shot transfer of heads Paper • 2502.18417 • Published 22 days ago • 63
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 22 days ago • 69
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published about 1 month ago • 145
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 29 days ago • 78
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published 30 days ago • 37
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 184
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published Aug 24, 2024 • 23
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published Aug 24, 2024 • 14