ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 6 days ago • 17
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 3 days ago • 50
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 4 days ago • 31
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 6 days ago • 35
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning Paper • 2503.16081 • Published 17 days ago • 25
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 10 days ago • 21
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published 10 days ago • 70
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 12 days ago • 32
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 12 days ago • 47
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published 12 days ago • 11
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published 20 days ago • 17
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Paper • 2503.19855 • Published 12 days ago • 25
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published 13 days ago • 29
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Paper • 2503.18406 • Published 14 days ago • 3
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models Paper • 2503.18923 • Published 13 days ago • 12
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published 14 days ago • 18