view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 141
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods Jan 18, 2024 • 44