Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 5 days ago • 56
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs Paper • 2412.04144 • Published Dec 5, 2024 • 5
Source-Aware Training Enables Knowledge Attribution in Language Models Paper • 2404.01019 • Published Apr 1, 2024 • 1
Discriminator-Guided Multi-step Reasoning with Language Models Paper • 2305.14934 • Published May 24, 2023 • 1