MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 70 • 6
Simple linear attention language models balance the recall-throughput tradeoff Paper • 2402.18668 • Published Feb 28 • 17 • 12
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 172 • 12
Resonance RoPE: Improving Context Length Generalization of Large Language Models Paper • 2403.00071 • Published Feb 29 • 19 • 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 12 • 3
Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 16 • 4
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33 • 4
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer Paper • 2310.12442 • Published Oct 19, 2023 • 1 • 1
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition Paper • 2402.15220 • Published Feb 23 • 18 • 6
Cure the headache of Transformers via Collinear Constrained Attention Paper • 2309.08646 • Published Sep 15, 2023 • 12 • 6
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 18 • 6
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104 • 16
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1 • 22 • 3
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26 • 17 • 6
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 24 • 2
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13 • 23 • 2
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 62 • 6