Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published 26 days ago • 12
Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published 26 days ago • 12
Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published 26 days ago • 12 • 2
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks Paper • 2407.02855 • Published Jul 3, 2024 • 10
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks Paper • 2407.02855 • Published Jul 3, 2024 • 10 • 1
SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Paper • 2309.07045 • Published Sep 13, 2023
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization Paper • 2311.09096 • Published Nov 15, 2023
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks Paper • 2407.02855 • Published Jul 3, 2024 • 10