Democratizing Reasoning Ability: Tailored Learning from Large Language Model Paper • 2310.13332 • Published Oct 20, 2023 • 14
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 45