Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 21 days ago • 38
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2 • 32