Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 21 days ago • 16
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents Paper • 2208.13266 • Published Aug 28, 2022 • 1
Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA Paper • 2401.15847 • Published Jan 29, 2024 • 2
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12, 2024 • 28
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models Paper • 2310.03903 • Published Oct 5, 2023
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published Jun 27, 2024 • 10
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published Jun 27, 2024 • 10