SCccc21 commited on
Commit
f301de9
·
verified ·
1 Parent(s): 3990344

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -3
README.md CHANGED
@@ -1,3 +1,27 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+
5
+ # GALA (official)
6
+ Official implementation for: Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning
7
+
8
+ Code: https://github.com/SCccc21/GALA.git \
9
+ Paper: https://arxiv.org/abs/2504.01278
10
+
11
+ ## Abstract
12
+
13
+ The exploitation of large language models (LLMs) for malicious purposes poses significant security
14
+ risks as these models become more powerful and widespread. While most existing red-teaming
15
+ frameworks focus on single-turn attacks, real-world adversaries typically operate in multi-turn
16
+ scenarios, iteratively probing for vulnerabilities and adapting their prompts based on threat model
17
+ responses. In this paper, we propose GALA, a novel multi-turn red-teaming agent that emulates
18
+ sophisticated human attackers through complementary learning dimensions: global tactic-wise
19
+ learning that accumulates knowledge over time and generalizes to new attack goals, and local promptwise learning that refines implementations for specific goals when initial attempts fail. Unlike
20
+ previous multi-turn approaches that rely on fixed strategy sets, GALA enables the agent to identify
21
+ new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations
22
+ for selected tactics. Empirical evaluations on JailbreakBench demonstrate our framework’s superior
23
+ performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B
24
+ within 5 conversation turns, outperforming state-of-the-art baselines. These results highlight the
25
+ effectiveness of dynamic learning in identifying and exploiting model vulnerabilities in realistic
26
+ multi-turn scenarios.
27
+