Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 9 days ago • 15
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 15 days ago • 48
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 22 days ago • 20
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published Dec 5, 2024 • 34
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 20