LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Paper • 2504.16030 • Published 2 days ago • 20
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published 13 days ago • 27
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space Paper • 2504.13835 • Published 6 days ago • 35
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published 3 days ago • 18
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 144
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 16 days ago • 61
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 16 days ago • 151
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 17 days ago • 171
Black Swan (Abductive and Defeasible Reasoning) Collection Data for CVPR 2025 paper, "Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events" • 3 items • Updated Mar 22 • 2
MedSAM2: Segment Anything in 3D Medical Images and Videos Paper • 2504.03600 • Published 20 days ago • 8