Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published 16 days ago • 25
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Paper • 2404.13594 • Published Apr 21 • 1
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Paper • 2409.05395 • Published Sep 9 • 5