Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.16994

Papers - Image - LLaVA

OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9 • 72
On Speculative Decoding for Multimodal Large Language Models

Paper • 2404.08856 • Published Apr 13 • 13
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 34

Vision-Language Model

Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 10
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 35
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6

Papers - Video - Captions

Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 11
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 34
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 69

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19 • 28
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 34
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Paper • 2405.14129 • Published May 23 • 9

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 29
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18 • 11
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 26
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 25

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13 • 48
Language models scale reliably with over-training and on downstream tasks

Paper • 2403.08540 • Published Mar 13 • 14
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25 • 34

FAX: Scalable and Differentiable Federated Primitives in JAX

Paper • 2403.07128 • Published Mar 11 • 11
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19 • 29
Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1 • 14
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 6
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks

Paper • 1811.00056 • Published Oct 31, 2018 • 2

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1 • 28
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 25
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8 • 16

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 14
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 26
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper • 2401.09047 • Published Jan 17 • 13

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs