Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) Paper • 2504.03151 • Published 6 days ago • 9
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models Paper • 2504.04718 • Published 3 days ago • 33
Slow-Fast Architecture for Video Multi-Modal Large Language Models Paper • 2504.01328 • Published 8 days ago • 6
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Paper • 2504.03641 • Published 5 days ago • 12
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published 6 days ago • 39
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 8 days ago • 21
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 6 days ago • 54
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 7 days ago • 34
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 9 days ago • 36
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning Paper • 2503.16081 • Published 20 days ago • 26
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 13 days ago • 21
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published 13 days ago • 71
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 15 days ago • 33
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 14 days ago • 47