PAVE: Patching and Adapting Video Large Language Models Paper • 2503.19794 • Published 30 days ago • 4
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 17
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning Paper • 2412.03248 • Published Dec 4, 2024 • 28
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning Paper • 2412.03248 • Published Dec 4, 2024 • 28 • 2
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 14
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations Paper • 2303.17839 • Published Mar 31, 2023
Learning Concise and Descriptive Attributes for Visual Recognition Paper • 2308.03685 • Published Aug 7, 2023