gregH commited on
Commit
794e59e
·
verified ·
1 Parent(s): dc75558

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +18 -18
index.html CHANGED
@@ -4,13 +4,13 @@
4
  <meta charset="UTF-8">
5
 
6
  <!-- Begin Jekyll SEO tag v2.8.0 -->
7
- <title>Gradient Cuff | Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes </title>
8
- <meta property="og:title" content="Gradient Cuff" />
9
  <meta property="og:locale" content="en_US" />
10
- <meta name="description" content="Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes" />
11
- <meta property="og:description" content="Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes" />
12
  <script type="application/ld+json">
13
- {"@context":"https://schema.org","@type":"WebSite","description":"Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes","headline":"Gradient Cuff","name":"Gradient Cuff","url":"https://huggingface.co/spaces/gregH/NCTV-GradientCuff"}</script>
14
  <!-- End Jekyll SEO tag -->
15
 
16
  <link rel="preconnect" href="https://fonts.gstatic.com">
@@ -45,8 +45,8 @@
45
  <a id="skip-to-content" href="#content">Skip to the content.</a>
46
 
47
  <header class="page-header" role="banner">
48
- <h1 class="project-name">Gradient Cuff</h1>
49
- <h2 class="project-tagline">Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes</h2>
50
 
51
 
52
  </header>
@@ -62,7 +62,7 @@ our proposed framework <strong>Neural Clamping</strong>, which employs a simple
62
  transformation on a pre-trained classifier. We also provide other calibration approaches
63
  (e.g., temperature scaling) to compare with Neural Clamping.</p>
64
 
65
- <h2 id="what-is-jailbreak">What is Jailbreak?</h2>
66
  <p>Neural Network Calibration seeks to make model prediction align with its true correctness likelihood.
67
  A well-calibrated model should provide accurate predictions and reliable confidence when making inferences. On the
68
  contrary, a poor calibration model would have a wide gap between its accuracy and average confidence level.
@@ -70,24 +70,24 @@ This phenomenon could hamper scenarios requiring accurate uncertainty estimation
70
  (e.g., autonomous driving systems, medical diagnosis, etc.).</p>
71
 
72
  <div class="container">
73
- <div id="jailbrerak-intro" class="row align-items-center jailbreak-intro-sec">
74
- <img id="jailbreak-intro-img" src="https://hsiung.cc/NCTV/images/conf_acc_demo.gif" />
75
  </div>
76
  </div>
77
 
78
- <h3 id="refusal-loss-function">Refusal Loss Function</h3>
79
  <p>Objectively, researchers utilize <strong>Calibration Metrics</strong> to measure the calibration error for a model, for example,
80
  Expected Calibration Error (ECE), Static Calibration Error (SCE), Adaptive Calibration Error (ACE), etc.</p>
81
 
82
- <div class="container refusal-loss-function-intro-sec">
83
- <div><img id="refusal-loss-function-intro-img" src="images/metrics/intro-metric-example.png" /></div>
84
  </div>
85
 
86
- <div id="refusal-loss-formula" class="container">
87
- <div id="refusal-loss-formula-list" class="row align-items-center formula-list">
88
- <a href="#ECE-formula" class="selected">RL</a>
89
- <a href="#SCE-formula">SRR</a>
90
- <a href="#ACE-formula">GE</a>
91
  <div style="clear: both"></div>
92
  </div>
93
  <div id="calibration-metrics-formula-content" class="row align-items-center">
 
4
  <meta charset="UTF-8">
5
 
6
  <!-- Begin Jekyll SEO tag v2.8.0 -->
7
+ <title>NCTV | Neural Clamping Toolkit and Visualization for Neural Network Calibration</title>
8
+ <meta property="og:title" content="NCTV" />
9
  <meta property="og:locale" content="en_US" />
10
+ <meta name="description" content="Neural Clamping Toolkit and Visualization for Neural Network Calibration" />
11
+ <meta property="og:description" content="Neural Clamping Toolkit and Visualization for Neural Network Calibration" />
12
  <script type="application/ld+json">
13
+ {"@context":"https://schema.org","@type":"WebSite","description":"Neural Clamping Toolkit and Visualization for Neural Network Calibration","headline":"NCTV","name":"NCTV","url":"https://huggingface.co/spaces/hsiung/NCTV"}</script>
14
  <!-- End Jekyll SEO tag -->
15
 
16
  <link rel="preconnect" href="https://fonts.gstatic.com">
 
45
  <a id="skip-to-content" href="#content">Skip to the content.</a>
46
 
47
  <header class="page-header" role="banner">
48
+ <h1 class="project-name">NCTV</h1>
49
+ <h2 class="project-tagline">Neural Clamping Toolkit and Visualization for Neural Network Calibration</h2>
50
 
51
 
52
  </header>
 
62
  transformation on a pre-trained classifier. We also provide other calibration approaches
63
  (e.g., temperature scaling) to compare with Neural Clamping.</p>
64
 
65
+ <h2 id="what-is-calibration">What is Calibration?</h2>
66
  <p>Neural Network Calibration seeks to make model prediction align with its true correctness likelihood.
67
  A well-calibrated model should provide accurate predictions and reliable confidence when making inferences. On the
68
  contrary, a poor calibration model would have a wide gap between its accuracy and average confidence level.
 
70
  (e.g., autonomous driving systems, medical diagnosis, etc.).</p>
71
 
72
  <div class="container">
73
+ <div id="calibration-intro" class="row align-items-center calibration-intro-sec">
74
+ <img id="calibration-intro-img" src="https://hsiung.cc/NCTV/images/conf_acc_demo.gif" />
75
  </div>
76
  </div>
77
 
78
+ <h3 id="calibration-metrics">Calibration Metrics</h3>
79
  <p>Objectively, researchers utilize <strong>Calibration Metrics</strong> to measure the calibration error for a model, for example,
80
  Expected Calibration Error (ECE), Static Calibration Error (SCE), Adaptive Calibration Error (ACE), etc.</p>
81
 
82
+ <div class="container calibration-intro-sec">
83
+ <div><img id="calibration-intro-img" src="images/metrics/intro-metric-example.png" /></div>
84
  </div>
85
 
86
+ <div id="calibration-metrics-formula" class="container">
87
+ <div id="calibration-metrics-formula-list" class="row align-items-center formula-list">
88
+ <a href="#ECE-formula" class="selected">ECE</a>
89
+ <a href="#SCE-formula">SCE</a>
90
+ <a href="#ACE-formula">ACE</a>
91
  <div style="clear: both"></div>
92
  </div>
93
  <div id="calibration-metrics-formula-content" class="row align-items-center">