-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 50 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 17 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00838
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 37 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 74 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 10 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41
-
Attention Is All You Need
Paper • 1706.03762 • Published • 35 -
You Only Look Once: Unified, Real-Time Object Detection
Paper • 1506.02640 • Published -
HEp-2 Cell Image Classification with Deep Convolutional Neural Networks
Paper • 1504.02531 • Published -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 23
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 81 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 62 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 46
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38 -
Qwen Technical Report
Paper • 2309.16609 • Published • 30 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 3 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 59 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 55 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 74
-
Attention Is All You Need
Paper • 1706.03762 • Published • 35 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 11
-
google/flan-t5-large
Text2Text Generation • Updated • 738k • 463 -
deepseek-ai/deepseek-coder-6.7b-instruct
Text Generation • Updated • 95k • 306 -
Object Recognition as Next Token Prediction
Paper • 2312.02142 • Published • 11 -
colbert-ir/dspy-Oct11-T5-Large-MH-3k-v1
Text2Text Generation • Updated • 13 • 1