-
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 12 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 40 -
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Paper • 2312.02949 • Published • 8 -
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2311.05437
-
Visual In-Context Prompting
Paper • 2311.13601 • Published • 14 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 2 -
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Paper • 2303.02927 • Published • 3 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper • 2311.07361 • Published • 11
-
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper • 2311.06495 • Published • 9 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 25 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 40 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 17
-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 40 -
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Paper • 2311.05332 • Published • 7 -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Paper • 2311.03517 • Published • 9
-
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 39 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 40 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper • 2310.08166 • Published • 1 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3