laion/CLIP-ViT-bigG-14-laion2B-39B-b160k Zero-Shot Image Classification • Updated Jan 22 • 304k • 260
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published Dec 10, 2024 • 29
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Paper • 2404.13594 • Published Apr 21, 2024 • 1
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Paper • 2409.05395 • Published Sep 9, 2024 • 5