AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts Paper • 2402.07625 • Published Feb 12 • 11
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 46
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Paper • 2402.10379 • Published Feb 16 • 29
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training Paper • 2406.10670 • Published Jun 15 • 4
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models Paper • 2408.02085 • Published Aug 4 • 17
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Paper • 2410.13785 • Published 17 days ago • 18