"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Paper • 2308.03825 • Published Aug 7, 2023 • 2
facebook/roberta-hate-speech-dynabench-r4-target Text Classification • Updated Mar 16, 2023 • 1.83M • 65