SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22, 2024 • 23
JetMoE: Reaching Llama2 Performance with 0.1M Dollars Paper • 2404.07413 • Published Apr 11, 2024 • 36
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models Paper • 2402.19481 • Published Feb 29, 2024 • 20
BitDelta: Your Fine-Tune May Only Be Worth One Bit Paper • 2402.10193 • Published Feb 15, 2024 • 19
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 54