WildChat-50m Collection All model responses associated with the WildChat-50m paper. • 55 items • Updated Jan 29 • 8
Whisper Release Collection Whisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large. • 12 items • Updated Sep 13, 2023 • 99
SWE-bench Collection SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues. • 4 items • Updated 17 days ago • 3
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 17 items • Updated 5 days ago • 111
MAmmoTH2 Collection Scaling up instruction data from the web for to build better LLMs • 13 items • Updated Dec 9, 2024 • 11
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 654
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 28