-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 50 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 17 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2404.08801
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 50 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 11 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 10 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 25 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 8 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 59 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 11
-
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 73 -
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Paper • 2404.10719 • Published • 3 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 62 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 32
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 167 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 10 -
Attention Is All You Need
Paper • 1706.03762 • Published • 36
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 1 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 59 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
-
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 39 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 62 -
Make Your LLM Fully Utilize the Context
Paper • 2404.16811 • Published • 52