"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Paper • 2308.03825 • Published Aug 7, 2023 • 2
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 102
Awesome RLHF Collection A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). • 11 items • Updated Oct 2, 2023 • 7