gregH commited on
Commit
8b5c98a
1 Parent(s): c1e761a

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +5 -5
index.html CHANGED
@@ -130,11 +130,11 @@ Exploring Refusal Loss Landscapes </title>
130
  <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
131
 
132
  <h2 id="demonstration">Demonstration</h2>
133
- <p>In the current research, a reliability diagram is drawn to show the calibration performance of a model. However, since
134
- reliability diagrams often only provide fixed bar graphs statically, further explanation from the chart is limited. In
135
- this demonstration, we show how to make reliability diagrams interactive and insightful to help researchers and
136
- developers gain more insights from the graph. Specifically, we provide three CIFAR-100 classification models
137
- in this demonstration. Multiple Bin numbers are also supported </p>
138
 
139
  <p>We hope this tool could also facilitate the development process.</p>
140
 
 
130
  <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
131
 
132
  <h2 id="demonstration">Demonstration</h2>
133
+ <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
134
+ different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
135
+ We report the average refusal rate across these 6 malicious user query datasets as True Positive Rate~(TPR) and the refusal rate
136
+ on benign user queries as False Positive Rate~(FPR).
137
+ </p>
138
 
139
  <p>We hope this tool could also facilitate the development process.</p>
140