SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices Paper • 2406.02532 • Published Jun 4, 2024 • 13
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Paper • 2306.03078 • Published Jun 5, 2023 • 3
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding Paper • 2402.12374 • Published Feb 19, 2024 • 3
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 25