Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts Paper • 2309.04354 • Published Sep 8, 2023 • 13
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling Paper • 2309.16534 • Published Sep 28, 2023 • 15
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 2