VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published Oct 2 • 22
Improving Context-Aware Preference Modeling for Language Models Paper • 2407.14916 • Published Jul 20 • 4