DPO Meets PPO: Reinforced Token Optimization for RLHF Paper • 2404.18922 • Published Apr 29, 2024 • 1