BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12 • 21
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 41
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 58
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 98
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 45
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 16 items • Updated 3 days ago • 35
Gemma 2 2B Release Collection The 2.6B parameter version of Gemma 2. • 6 items • Updated 10 days ago • 77
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 56
4M Tokenizers Collection Multimodal tokenizers from https://4m.epfl.ch/ • 12 items • Updated Jun 14 • 5
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17 • 20
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 15
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26 • 36
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 32
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Paper • 2306.17842 • Published Jun 30, 2023 • 9