-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 119 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 45 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 9 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 62
Collections
Discover the best community collections!
Collections including paper arxiv:2402.05472
-
Question Aware Vision Transformer for Multimodal Reasoning
Paper • 2402.05472 • Published • 6 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 31 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper • 2402.05930 • Published • 35 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 46
-
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 17 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 11 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 10
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 48 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 6 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 24 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6