-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 139 -
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Paper • 2409.18943 • Published • 29 -
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Paper • 2411.16594 • Published • 40 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38
Yuan
MinakamiYuki
·
AI & ML interests
None yet
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet