A Unified Sequence Parallelism Approach for Long Context Generative AI Paper • 2405.07719 • Published 9 days ago • 1
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 8 days ago • 24
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention Paper • 2405.04437 • Published 15 days ago • 3
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Paper • 2306.12929 • Published Jun 22, 2023 • 11
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published 29 days ago • 120
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 533
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 18
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 21
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 101
Rethinking Memory and Communication Cost for Efficient Large Language Model Training Paper • 2310.06003 • Published Oct 9, 2023 • 2
Optimized Network Architectures for Large Language Model Training with Billions of Parameters Paper • 2307.12169 • Published Jul 22, 2023 • 9
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 31
Unicron: Economizing Self-Healing LLM Training at Scale Paper • 2401.00134 • Published Dec 30, 2023 • 9
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14 • 19
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 24
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey Paper • 2311.12351 • Published Nov 21, 2023 • 3
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5 • 10
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models Paper • 2402.02244 • Published Feb 3 • 1
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Paper • 2310.03294 • Published Oct 5, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 16
Striped Attention: Faster Ring Attention for Causal Transformers Paper • 2311.09431 • Published Nov 15, 2023 • 4
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 8
Sequence Parallelism: Long Sequence Training from System Perspective Paper • 2105.13120 • Published May 26, 2021 • 5
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 38
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 50
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 31
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 55
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 50
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 46