Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# GALA (official)
|
| 6 |
+
Official implementation for: Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
|
| 7 |
+
|
| 8 |
+
Code: https://github.com/SCccc21/GALA.git \
|
| 9 |
+
Paper: https://arxiv.org/abs/2504.01278
|
| 10 |
+
|
| 11 |
+
## Abstract
|
| 12 |
+
|
| 13 |
+
The exploitation of large language models (LLMs) for malicious purposes poses significant security
|
| 14 |
+
risks as these models become more powerful and widespread. While most existing red-teaming
|
| 15 |
+
frameworks focus on single-turn attacks, real-world adversaries typically operate in multi-turn
|
| 16 |
+
scenarios, iteratively probing for vulnerabilities and adapting their prompts based on threat model
|
| 17 |
+
responses. In this paper, we propose GALA, a novel multi-turn red-teaming agent that emulates
|
| 18 |
+
sophisticated human attackers through complementary learning dimensions: global tactic-wise
|
| 19 |
+
learning that accumulates knowledge over time and generalizes to new attack goals, and local promptwise learning that refines implementations for specific goals when initial attempts fail. Unlike
|
| 20 |
+
previous multi-turn approaches that rely on fixed strategy sets, GALA enables the agent to identify
|
| 21 |
+
new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations
|
| 22 |
+
for selected tactics. Empirical evaluations on JailbreakBench demonstrate our framework’s superior
|
| 23 |
+
performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B
|
| 24 |
+
within 5 conversation turns, outperforming state-of-the-art baselines. These results highlight the
|
| 25 |
+
effectiveness of dynamic learning in identifying and exploiting model vulnerabilities in realistic
|
| 26 |
+
multi-turn scenarios.
|
| 27 |
+
|