PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 17 days ago • 118
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 23
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8 • 24
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 50
VideoCon: Robust Video-Language Alignment via Contrast Captions Paper • 2311.10111 • Published Nov 15, 2023 • 7
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 5
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 32
What You See is What You Read? Improving Text-Image Alignment Evaluation Paper • 2305.10400 • Published May 17, 2023 • 2
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Paper • 2303.07274 • Published Mar 13, 2023 • 2