Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts Paper • 2309.04354 • Published Sep 8, 2023 • 13
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling Paper • 2309.16534 • Published Sep 28, 2023 • 15
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 3
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 32
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM Paper • 2408.07246 • Published Aug 14 • 21