-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 12 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 30 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14687
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 18 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 15 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 27 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 13
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • Updated • 1.09k • 32 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 13 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • Updated • 11.8k • 50 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 13
-
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper • 2311.06772 • Published • 34 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 28 -
A Survey on Language Models for Code
Paper • 2311.07989 • Published • 21 -
Instruction-Following Evaluation for Large Language Models
Paper • 2311.07911 • Published • 19