Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper β’ 2403.08763 β’ Published Mar 13 β’ 49
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper β’ 2403.20041 β’ Published Mar 29 β’ 34
Advancing LLM Reasoning Generalists with Preference Trees Paper β’ 2404.02078 β’ Published Apr 2 β’ 44
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper β’ 2404.02258 β’ Published Apr 2 β’ 104
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper β’ 2401.02038 β’ Published Jan 4 β’ 62
SUTRA: Scalable Multilingual Language Model Architecture Paper β’ 2405.06694 β’ Published May 7 β’ 37
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper β’ 2405.08707 β’ Published May 14 β’ 27
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper β’ 2405.10637 β’ Published May 17 β’ 19