Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1 • 19
On the Scalability of Diffusion-based Text-to-Image Generation Paper • 2404.02883 • Published Apr 3 • 17
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 7
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 74
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published 28 days ago • 8
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published 28 days ago • 28
RegionGPT: Towards Region Understanding Vision Language Model Paper • 2403.02330 • Published Mar 4 • 2
On Speculative Decoding for Multimodal Large Language Models Paper • 2404.08856 • Published 27 days ago • 9
MultiBooth: Towards Generating All Your Concepts in an Image from Text Paper • 2404.14239 • Published 17 days ago • 7
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published 15 days ago • 24
DOCCI: Descriptions of Connected and Contrasting Images Paper • 2404.19753 • Published 9 days ago • 5
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 9 days ago • 60