Towards Physically Plausible Video Generation via VLM Planning Paper • 2503.23368 • Published 16 days ago • 38
Towards Physically Plausible Video Generation via VLM Planning Paper • 2503.23368 • Published 16 days ago • 38
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published 26 days ago • 39
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 20
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark Paper • 2306.06687 • Published Jun 11, 2023 • 1
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models Paper • 2311.02692 • Published Nov 5, 2023 • 1
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models Paper • 2312.08962 • Published Dec 14, 2023
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 38
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 38