Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14687

Papers - Video - Understanding

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 12
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 30
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Paper • 2403.13501 • Published Mar 20 • 9
LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27 • 17

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 15
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 27
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper • 2401.09047 • Published Jan 17 • 13

PsiPi/liuhaotian_llava-v1.5-13b-GGUF

Image-Text-to-Text • Updated Mar 11 • 1.09k • 32
TRI-ML/prismatic-vlms

Image-to-Text • Updated May 6 • 13
bczhou/tiny-llava-v1-hf

Image-Text-to-Text • Updated Aug 17 • 11.8k • 50
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9 • 13

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Paper • 2311.06772 • Published Nov 12, 2023 • 34
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 28
A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 21
Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 19

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs