Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.09246

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11
On the Hidden Mystery of OCR in Large Multimodal Models

Paper • 2305.07895 • Published May 13, 2023
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 48

A collection of Audio, Video and Visual LLMs.

myshell-ai/OpenVoice

Text-to-Speech • Updated Apr 24 • 391
Running

970

🤗

OpenVoice
dataautogpt3/ProteusV0.3

Text-to-Image • Updated Feb 12 • 57.5k • 92
ByteDance/SDXL-Lightning

Text-to-Image • Updated Apr 3 • 93k • 1.91k

Vision-Language

SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6
Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 14
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Paper • 2201.12086 • Published Jan 28, 2022 • 3
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Paper • 2305.15028 • Published May 24, 2023 • 1

LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

Paper • 2309.06440 • Published Sep 12, 2023 • 9
Robotic Table Tennis: A Case Study into a High Speed Learning System

Paper • 2309.03315 • Published Sep 6, 2023 • 6
Video Language Planning

Paper • 2310.10625 • Published Oct 16, 2023 • 9
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Paper • 2311.01455 • Published Nov 2, 2023 • 28

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs