A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization Paper • 2606.16154 • Published 4 days ago • 6