HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper • 2402.04249 • Published Feb 6, 2024 • 4
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique Paper • 2408.10701 • Published Aug 20, 2024 • 11
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Jul 23, 2024 • 225
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 28