Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 17
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21 • 4
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published 11 days ago • 23