Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 254
view post Post 2167 Reply Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support! Contributions are welcome to support more models! 🔥 🔥 13 13 ❤️ 4 4 🤯 3 3 🤝 3 3 +
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper • 2201.02177 • Published Jan 6, 2022 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 50
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30 • 6
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155