Update index.html
Browse files- index.html +1 -1
index.html
CHANGED
@@ -158,7 +158,7 @@ We provide more details about the running flow of Gradient Cuff in the paper.
|
|
158 |
<h2 id="demonstration">Demonstration</h2>
|
159 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|
160 |
different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
|
161 |
-
We demonstrate the average refusal rate across these 6 malicious user query datasets and the refusal rate
|
162 |
on benign user queries as the Benign Refusal Rate.
|
163 |
</p>
|
164 |
|
|
|
158 |
<h2 id="demonstration">Demonstration</h2>
|
159 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|
160 |
different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
|
161 |
+
We demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal Rate and the refusal rate
|
162 |
on benign user queries as the Benign Refusal Rate.
|
163 |
</p>
|
164 |
|