Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published 5 days ago • 46
Language models scale reliably with over-training and on downstream tasks Paper • 2403.08540 • Published Mar 13 • 13
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 41
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13 • 48
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 49
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 69
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks Paper • 2403.05185 • Published Mar 8 • 19
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 68