Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 3 days ago • 52
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published 8 days ago • 25
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published 10 days ago • 27
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 14 days ago • 46
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 24 days ago • 40
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 16 days ago • 112
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 20 items • Updated 3 days ago • 118
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 23 days ago • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 216