Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons Paper • 2301.11270 • Published Jan 26, 2023 • 2
Online Learning in Stackelberg Games with an Omniscient Follower Paper • 2301.11518 • Published Jan 27, 2023 • 1
On Optimal Caching and Model Multiplexing for Large Model Inference Paper • 2306.02003 • Published Jun 3, 2023 • 1
Fine-Tuning Language Models with Advantage-Induced Policy Alignment Paper • 2306.02231 • Published Jun 4, 2023 • 2