Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published Jun 27 • 24
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21 • 15
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published about 1 month ago • 12