PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Paper • 2503.12545 • Published 3 days ago • 4
Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 1 day ago • 10
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification Paper • 2503.12505 • Published 3 days ago • 8
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published 3 days ago • 24
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published 1 day ago • 37
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published 3 days ago • 23
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published 2 days ago • 12
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published 5 days ago • 11
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published 3 days ago • 24
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published 2 days ago • 19
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published 3 days ago • 56
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy Paper • 2503.06542 • Published 10 days ago • 7
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published 6 days ago • 16
On the Limitations of Vision-Language Models in Understanding Image Transforms Paper • 2503.09837 • Published 7 days ago • 10
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Paper • 2503.10602 • Published 6 days ago • 4
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 6 days ago • 18
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published 6 days ago • 31
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published 6 days ago • 45
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 12 days ago • 32
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 11 days ago • 9