Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 2 days ago • 16 • 1
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2 • 32 • 4
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 55 • 4