InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4
Learning Human Motion Representations: A Unified Perspective Paper • 2210.06551 • Published Oct 12, 2022
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Paper • 2406.08394 • Published Jun 12, 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions Paper • 2407.20962 • Published Jul 30, 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 27 days ago • 122
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 27 days ago • 122