Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published 9 days ago • 81
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published 10 days ago • 14
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published 10 days ago • 42
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated 26 days ago • 50
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated about 23 hours ago • 484
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 1 item • Updated about 1 month ago • 47
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published 17 days ago • 35
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 17 days ago • 49
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published 24 days ago • 26
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published about 1 month ago • 51
ComiCap: A VLMs pipeline for dense captioning of Comic Panels Paper • 2409.16159 • Published Sep 24 • 1
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 19