Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 21
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper • 2412.06531 • Published Dec 9, 2024 • 71
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23, 2024 • 200
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28, 2024 • 9
Visual Context Window Extension: A New Perspective for Long Video Understanding Paper • 2409.20018 • Published Sep 30, 2024 • 10
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19, 2024 • 25
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published Sep 2, 2024 • 95
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis Paper • 2407.13301 • Published Jul 18, 2024 • 56
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents Paper • 2407.04363 • Published Jul 5, 2024 • 27
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 93
Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization Paper • 2310.02170 • Published Oct 3, 2023 • 2