Collections
Discover the best community collections!
Collections including paper arxiv:2309.16058
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Paper • 2306.14824 • Published • 34 -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Paper • 2310.02992 • Published • 4 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 50 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 55
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 55 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20 -
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Paper • 2402.07865 • Published • 12 -
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Paper • 2401.08740 • Published • 12