-
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 28 -
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Paper • 2403.11481 • Published • 10 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 22 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2402.17139
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 122 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 45 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 9 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 62
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 17 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 18 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 33 -
microsoft/xclip-base-patch16-zero-shot
Video Classification • Updated • 4.32k • 20
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 17 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 25 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 24 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 14
-
Visual In-Context Prompting
Paper • 2311.13601 • Published • 14 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 139 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 2 -
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Paper • 2303.02927 • Published • 3
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 17 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 14 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 22 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 13
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 10 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41