Update index.html
Browse files- index.html +4 -7
index.html
CHANGED
@@ -143,13 +143,10 @@ Exploring Refusal Loss Landscapes </title>
|
|
143 |
<p>
|
144 |
Gradient Cuff can be summarized into two phases:
|
145 |
<span>
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
\end{itemize}
|
151 |
-
$$
|
152 |
-
</span>
|
153 |
</p>
|
154 |
|
155 |
|
|
|
143 |
<p>
|
144 |
Gradient Cuff can be summarized into two phases:
|
145 |
<span>
|
146 |
+
<strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether $f_\theta(x)<0.5$. If true, then $x$ is rejected, otherwise, $x$ is pushed into phase 2.
|
147 |
+
</p>
|
148 |
+
<p>
|
149 |
+
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard $x$ as having jailbreak attempts if the norm of the estimated gradient $g_\theta(x)$ is larger than a configurable threshold $t$, i.e., $\|g_\theta(x)\| > t$.
|
|
|
|
|
|
|
150 |
</p>
|
151 |
|
152 |
|