Sequence Parallelism: Long Sequence Training from System Perspective Paper • 2105.13120 • Published May 26, 2021 • 5
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 10
Striped Attention: Faster Ring Attention for Causal Transformers Paper • 2311.09431 • Published Nov 15, 2023 • 4
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Paper • 2310.03294 • Published Oct 5, 2023 • 2
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models Paper • 2402.02244 • Published Feb 3 • 1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5 • 14
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 26
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 112
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 63
A Unified Sequence Parallelism Approach for Long Context Generative AI Paper • 2405.07719 • Published May 13 • 2
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism Paper • 2406.18485 • Published Jun 26 • 2