MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 8 items • Updated 17 days ago • 95
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 74