79 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases · 12 authors 9
18 ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition · 4 authors 6
17 Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models · 3 authors 6