Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment Paper • 2401.12474 • Published Jan 23 • 35
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19 • 24
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models Paper • 2310.00746 • Published Oct 1, 2023 • 1
LESS: Selecting Influential Data for Targeted Instruction Tuning Paper • 2402.04333 • Published Feb 6 • 3
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 31
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 95
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic Paper • 2407.18129 • Published Jul 25 • 11
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28 • 20
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models Paper • 2410.18505 • Published Oct 24 • 8
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models Paper • 2404.02823 • Published Apr 3 • 2