Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Paper • 2309.10150 • Published Sep 18, 2023 • 24
Robotic Offline RL from Internet Videos via Value-Function Pre-Training Paper • 2309.13041 • Published Sep 22, 2023 • 8
Voyager: An Open-Ended Embodied Agent with Large Language Models Paper • 2305.16291 • Published May 25, 2023 • 9
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning Paper • 2310.20587 • Published Oct 31, 2023 • 16
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper • 2312.00849 • Published Dec 1, 2023 • 8
RLVF: Learning from Verbal Feedback without Overgeneralization Paper • 2402.10893 • Published Feb 16 • 10
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18 • 20
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment Paper • 2406.15193 • Published Jun 21 • 12
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24 • 22
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning Paper • 2408.08441 • Published Aug 15 • 6