FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Paper • 2407.11522 • Published Jul 16, 2024 • 8
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World Paper • 2310.10207 • Published Oct 16, 2023
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Paper • 2401.09340 • Published Jan 17, 2024 • 19
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment Paper • 2308.04352 • Published Aug 8, 2023