view article Article MInference 1.0: 10x Faster Million Context Inference with a Single GPU By liyucheng • 20 days ago • 10
view article Article How to Optimize TTFT of 8B LLMs with 1M Tokens to 20s By iofu728 • 10 days ago • 1
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published 29 days ago • 23
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression Paper • 2310.06839 • Published Oct 10, 2023 • 3
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models Paper • 2310.05736 • Published Oct 9, 2023 • 4
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19 • 24