Selective Visual Representations Improve Convergence and Generalization for Embodied AI Paper • 2311.04193 • Published Nov 7, 2023
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics Paper • 2406.10721 • Published Jun 15, 2024 • 1
SAT: Spatial Aptitude Training for Multimodal Language Models Paper • 2412.07755 • Published Dec 10, 2024
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation Paper • 2501.18564 • Published Jan 30 • 1
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation Paper • 2402.08191 • Published Feb 13, 2024
From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies Paper • 2412.02818 • Published Dec 3, 2024
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Paper • 2410.00371 • Published Oct 1, 2024
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation Paper • 2501.18564 • Published Jan 30 • 1
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics Paper • 2406.10721 • Published Jun 15, 2024 • 1