Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 3 days ago • 52 • 2
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Paper • 2503.04606 • Published 2 days ago • 6 • 1
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks Paper • 2503.04378 • Published 3 days ago • 6 • 3
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 3 days ago • 17 • 4
Running on Zero 69 69 Diffusion Self Distillation 🦀 Generate detailed images from an input image and text prompt
Light-R1 Collection Surpassing R1-Distill from Scratch* with 70k Math Data through Curriculum SFT & DPO • 3 items • Updated 5 days ago • 8
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 5 days ago • 56
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 5 days ago • 58