-
Efficient RLHF: Reducing the Memory Usage of PPO
Paper • 2309.00754 • Published • 15 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 30 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11
Dong Li
dongleecsu
AI & ML interests
None yet
Organizations
None yet
Collections
1
models
0
None public yet
datasets
0
None public yet