Collections
Discover the best community collections!
Collections including paper arxiv:2309.11419
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 74 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 9 -
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Paper • 2311.15127 • Published • 8 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 8
-
MEGA: Multilingual Evaluation of Generative AI
Paper • 2303.12528 • Published -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 13 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 49 -
A Unified View of Masked Image Modeling
Paper • 2210.10615 • Published
-
Kosmos-2: Grounding Multimodal Large Language Models to the World
Paper • 2306.14824 • Published • 34 -
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Paper • 2310.02992 • Published • 4 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 49 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 53
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 13 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 23 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 5 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 19
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 49 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 33 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 50 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 570
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 49 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 6 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 24 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6