rlhf - a yversleyamzn Collection

yversleyamzn 's Collections

rlhf

rlhf

updated Sep 20, 2023

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Paper • 2309.10150 • Published Sep 18, 2023 • 24