Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 27 days ago • 90
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 Feb 18 • 98
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images Paper • 2501.04689 • Published Jan 8 • 17
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper • 2501.04144 • Published Jan 7 • 19
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis Paper • 2501.04377 • Published Jan 8 • 14
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 27
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published Jan 9 • 92
Multi-subject Open-set Personalization in Video Generation Paper • 2501.06187 • Published Jan 10 • 14
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 66
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 51