FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning Paper • 2510.22543 • Published 19 days ago • 6 • 1
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning Paper • 2509.16548 • Published Sep 20 • 2
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24, 2024 • 42 • 3