gregH commited on
Commit
7ee8287
·
verified ·
1 Parent(s): 2d4556c

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +4 -3
index.html CHANGED
@@ -142,11 +142,12 @@ Exploring Refusal Loss Landscapes </title>
142
 
143
  <p>
144
  Gradient Cuff can be summarized into two phases:
145
- <span>
146
- <strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether $f_\theta(x)<0.5$. If true, then $x$ is rejected, otherwise, $x$ is pushed into phase 2.
147
  </p>
148
  <p>
149
- <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard $x$ as having jailbreak attempts if the norm of the estimated gradient $g_\theta(x)$ is larger than a configurable threshold $t$, i.e., $\|g_\theta(x)\| > t$.
 
 
 
150
  </p>
151
 
152
 
 
142
 
143
  <p>
144
  Gradient Cuff can be summarized into two phases:
 
 
145
  </p>
146
  <p>
147
+ <strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether the Refusal Loss value is below 0.5. If true, then user query is rejected, otherwise, the user query is pushed into phase 2.
148
+ </p>
149
+ <p>
150
+ <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
151
  </p>
152
 
153