Preference Optimization for Reasoning with Pseudo Feedback Paper • 2411.16345 • Published Nov 25, 2024 • 1
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing Paper • 2402.00658 • Published Feb 1, 2024