Running 2.5k 2.5k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29, 2024 • 50