mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Paper • 2502.08468 • Published 2 days ago • 6
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published 1 day ago • 12
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published 1 day ago • 20
SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation Paper • 2502.08168 • Published 3 days ago • 10
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 3 days ago • 23
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents Paper • 2502.04223 • Published 8 days ago • 9
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published 4 days ago • 11
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Paper • 2502.03628 • Published 9 days ago • 11
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation Paper • 2502.05415 • Published 7 days ago • 16