CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published 2 days ago • 27
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 1 day ago • 10
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 19
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 4 days ago • 31
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published 4 days ago • 17
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 4 days ago • 73
bartowski/OpenPipe_Deductive-Reasoning-Qwen-32B-GGUF Text Generation • Updated 4 days ago • 1.7k • 17