Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond Paper • 2310.02071 • Published Oct 3, 2023 • 4
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning Paper • 2309.07915 • Published Sep 14, 2023 • 4