SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published 9 days ago • 8
view article Article MInference 1.0: 10x Faster Million Context Inference with a Single GPU By liyucheng • Jul 11 • 12
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23