MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 276
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 346
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 203
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7 • 43
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 78
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17, 2024 • 78
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Paper • 2311.16079 • Published Nov 27, 2023 • 19