Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon Paper • 2401.03462 • Published Jan 7 • 25
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 8
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 58
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5 • 10
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 24
Extending Context Window of Large Language Models via Semantic Compression Paper • 2312.09571 • Published Dec 15, 2023 • 12
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention Paper • 2312.08618 • Published Dec 14, 2023 • 11
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 131
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13 • 23
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Paper • 2401.18079 • Published Jan 31 • 5
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5 • 3
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Paper • 2402.09727 • Published Feb 15 • 35
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 39
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 18
Striped Attention: Faster Ring Attention for Causal Transformers Paper • 2311.09431 • Published Nov 15, 2023 • 4
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 8
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7 • 4
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62