limuyu011/mc-vsft-sft-llama3_llava_next_8b-mcvqa_v4_11_21_80k-12-15-A100-c8-e1-b4-a4 Image-Text-to-Text • Updated 10 days ago • 9 • 1
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation Paper • 2403.05313 • Published Mar 8 • 9
RAM: Towards an Ever-Improving Memory System by Learning from Communications Paper • 2404.12045 • Published Apr 18 • 2
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper • 2407.00114 • Published Jun 27 • 12
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23 • 49
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents Paper • 2302.01560 • Published Feb 3, 2023 • 1
GROOT: Learning to Follow Instructions by Watching Gameplay Videos Paper • 2310.08235 • Published Oct 12, 2023 • 1