MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 37
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 37
Subject-driven Text-to-Image Generation via Apprenticeship Learning Paper • 2304.00186 • Published Apr 1, 2023
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? Paper • 2406.13121 • Published Jun 19 • 2
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28 • 23
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers Paper • 2311.17136 • Published Nov 28, 2023 • 7
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces Paper • 2306.00245 • Published May 31, 2023
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28 • 23
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 30
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 30
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 45
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Paper • 2210.03347 • Published Oct 7, 2022 • 3