Running 2.51k 2.51k The Ultra-Scale Playbook ๐ The ultimate guide to training LLM on large GPU Clusters
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding Paper โข 2502.05609 โข Published Feb 8 โข 18