SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20, 2023 • 4
Woodpecker: Hallucination Correction for Multimodal Large Language Models Paper • 2310.16045 • Published Oct 24, 2023 • 13
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 2
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories Paper • 2305.15028 • Published May 24, 2023 • 1
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 18
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 19
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? Paper • 2311.00047 • Published Oct 31, 2023 • 7