Statistical Rejection Sampling Improves Preference Optimization Paper • 2309.06657 • Published Sep 13, 2023 • 13
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Paper • 2309.10150 • Published Sep 18, 2023 • 24