Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction Paper • 2402.02416 • Published Feb 4 • 4
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28