In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 39
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration Paper • 2402.11550 • Published Feb 18 • 12
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 18
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length Paper • 2310.00576 • Published Oct 1, 2023 • 2
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 8
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 52
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published 29 days ago • 90
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published 29 days ago • 30
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 27 days ago • 61
Length Generalization of Causal Transformers without Position Encoding Paper • 2404.12224 • Published 21 days ago • 1