MV-Adapter: Multi-view Consistent Image Generation Made Easy Paper • 2412.03632 • Published 22 days ago • 22
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published 21 days ago • 35
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published 22 days ago • 15
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23 • 18
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published Jun 5 • 19
CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper • 2309.00610 • Published Sep 1, 2023 • 18
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models Paper • 2311.02692 • Published Nov 5, 2023 • 1
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark Paper • 2306.06687 • Published Jun 11, 2023 • 1
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception Paper • 2312.07472 • Published Dec 12, 2023 • 2
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26 • 35