Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29 • 26
Accelerating LLM Inference with Staged Speculative Decoding Paper • 2308.04623 • Published Aug 8, 2023 • 20
An Emulator for Fine-Tuning Large Language Models using Small Language Models Paper • 2310.12962 • Published Oct 19, 2023 • 13
On Speculative Decoding for Multimodal Large Language Models Paper • 2404.08856 • Published Apr 13 • 11
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18 • 16
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification Paper • 2305.09781 • Published May 16, 2023 • 4
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 55
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 61
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding Paper • 2402.02082 • Published Feb 3 • 1