faster-decoding - a mtc Collection

mtc 's Collections

interesting-papers

faster-decoding

interpretability

faster-decoding

updated Jan 22

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 53