ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 54
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7 • 42
Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published 27 days ago • 25
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published 15 days ago • 9