OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper โข 2503.22952 โข Published 6 days ago โข 17
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper โข 2503.11576 โข Published 20 days ago โข 81
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 โข 71
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper โข 2502.02737 โข Published Feb 4 โข 216
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release โข 12 items โข Updated Feb 20 โข 72
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper โข 2412.05271 โข Published Dec 6, 2024 โข 150
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 โข 31
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper โข 2412.10302 โข Published Dec 13, 2024 โข 17
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper โข 2412.10360 โข Published Dec 13, 2024 โข 146