Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 64
MagpieLM Collection Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated Sep 22 • 15
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published 7 days ago • 35
ScaleQuest Collection We introduce ScaleQuest, a scalable and novel data synthesis method. Project Page: https://scalequest.github.io/ • 8 items • Updated 6 days ago • 4
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 3 items • Updated 6 days ago • 25
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 47
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 63
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published 21 days ago • 6
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 59
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23 • 21
Power-LM Collection Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated 13 days ago • 15
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 606
Model with Circuit Breakers Collection SoTA models with circuit breakers inserted. Top safety performance without losing capabilities. • 3 items • Updated 5 days ago • 3
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
view article Article RegMix: Data Mixture as Regression for Language Model Pre-training By SivilTaram • Jul 11 • 10
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 33