Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 14
SelectLLM: Can LLMs Select Important Instructions to Annotate? Paper • 2401.16553 • Published Jan 29 • 3 • 2
Diversity Measurement and Subset Selection for Instruction Tuning Datasets Paper • 2402.02318 • Published Feb 4 • 2 • 2
LESS: Selecting Influential Data for Targeted Instruction Tuning Paper • 2402.04333 • Published Feb 6 • 3 • 2
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 5 • 2
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper • 2304.12244 • Published Apr 24, 2023 • 13 • 2