Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 5 days ago • 26
view article Article Key Insights into the Law of Vision Representations in MLLMs By Borise • 26 days ago • 16
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 10 days ago • 125
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
Chameleon Collection Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. • 2 items • Updated Jul 9 • 25
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published Jul 15 • 20
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 59