Collections
Discover the best community collections!
Collections including paper arxiv:2404.02258
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 139 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 99 -
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper • 2211.08411 • Published • 3
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 48
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 75 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 74 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58