Spaces:

TrustSafeAI
/

GradientCuff-Jailbreak-Defense

Running

App Files Files Community

gregH commited on Feb 26, 2024

Commit

7261a26

verified ·

1 Parent(s): 1351252

Update index.html

Browse files

Files changed (1) hide show

index.html +10 -9

index.html CHANGED Viewed

@@ -4,13 +4,14 @@
     <meta charset="UTF-8">
 <!-- Begin Jekyll SEO tag v2.8.0 -->
-<title>NCTV | Neural Clamping Toolkit and Visualization for Neural Network Calibration</title>
-<meta property="og:title" content="NCTV" />
 <meta property="og:locale" content="en_US" />
-<meta name="description" content="Neural Clamping Toolkit and Visualization for Neural Network Calibration" />
-<meta property="og:description" content="Neural Clamping Toolkit and Visualization for Neural Network Calibration" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"WebSite","description":"Neural Clamping Toolkit and Visualization for Neural Network Calibration","headline":"NCTV","name":"NCTV","url":"https://huggingface.co/spaces/hsiung/NCTV"}</script>
 <!-- End Jekyll SEO tag -->
     <link rel="preconnect" href="https://fonts.gstatic.com">
@@ -45,8 +46,8 @@
     <a id="skip-to-content" href="#content">Skip to the content.</a>
     <header class="page-header" role="banner">
-      <h1 class="project-name">NCTV</h1>
-      <h2 class="project-tagline">Neural Clamping Toolkit and Visualization for Neural Network Calibration</h2>
     </header>
@@ -62,7 +63,7 @@ our proposed framework <strong>Neural Clamping</strong>, which employs a simple
 transformation on a pre-trained classifier. We also provide other calibration approaches
 (e.g., temperature scaling) to compare with Neural Clamping.</p>
-<h2 id="what-is-calibration">What is Calibration?</h2>
 <p>Neural Network Calibration seeks to make model prediction align with its true correctness likelihood.
 A well-calibrated model should provide accurate predictions and reliable confidence when making inferences. On the
 contrary, a poor calibration model would have a wide gap between its accuracy and average confidence level.
@@ -196,7 +197,7 @@ Using this tool, users can use our proposed package, \(\texttt{NCTookit}\), to c
 <p>If you find Neural Clamping helpful and useful for your research, please cite our main paper as follows:</p>
 <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{hsiung2023nctv,
-  title={{NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration}},
   author={Lei Hsiung, Yung-Chen Tang and Pin-Yu Chen and Tsung-Yi Ho},
   booktitle={Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence},
   publisher={Association for the Advancement of Artificial Intelligence},

     <meta charset="UTF-8">
 <!-- Begin Jekyll SEO tag v2.8.0 -->
+<title>Gradient Cuff | Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by
+Exploring Refusal Loss Landscapes </title>
+<meta property="og:title" content="Gradient Cuff" />
 <meta property="og:locale" content="en_US" />
+<meta name="description" content="Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes" />
+<meta property="og:description" content="Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes" />
 <script type="application/ld+json">
+{"@context":"https://schema.org","@type":"WebSite","description":"Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes","headline":"Gradient Cuff","name":"Gradient Cuff","url":"https://huggingface.co/spaces/gregH/Gradient Cuff"}</script>
 <!-- End Jekyll SEO tag -->
     <link rel="preconnect" href="https://fonts.gstatic.com">
     <a id="skip-to-content" href="#content">Skip to the content.</a>
     <header class="page-header" role="banner">
+      <h1 class="project-name">Gradient Cuff</h1>
+      <h2 class="project-tagline">Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes</h2>
     </header>
 transformation on a pre-trained classifier. We also provide other calibration approaches
 (e.g., temperature scaling) to compare with Neural Clamping.</p>
+<h2 id="what-is-jailbreak">What is Calibration?</h2>
 <p>Neural Network Calibration seeks to make model prediction align with its true correctness likelihood.
 A well-calibrated model should provide accurate predictions and reliable confidence when making inferences. On the
 contrary, a poor calibration model would have a wide gap between its accuracy and average confidence level.
 <p>If you find Neural Clamping helpful and useful for your research, please cite our main paper as follows:</p>
 <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@inproceedings{hsiung2023nctv,
+  title={{NCTV: Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes}},
   author={Lei Hsiung, Yung-Chen Tang and Pin-Yu Chen and Tsung-Yi Ho},
   booktitle={Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence},
   publisher={Association for the Advancement of Artificial Intelligence},