PFPO - a chitanda Collection

chitanda 's Collections

PFPO

PFPO

updated about 17 hours ago

Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025)

Preference Optimization for Reasoning with Pseudo Feedback

Paper • 2411.16345 • Published Nov 25, 2024 • 1
chitanda/mathscale4o-800k

Viewer • Updated about 17 hours ago • 492k • 1
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Paper • 2402.00658 • Published Feb 1, 2024
chitanda/code-synthetic-test-cases

Preview • Updated about 13 hours ago • 2