bxiong commited on
Commit
33fbcd5
1 Parent(s): ccca66f

update more results

Browse files
Files changed (1) hide show
  1. index.html +25 -4
index.html CHANGED
@@ -42,6 +42,8 @@
42
  <link rel="stylesheet" href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
43
  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
44
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.min.js"></script>
 
 
45
  <script>
46
  $( function() {
47
  $( "#tabs" ).tabs();
@@ -615,9 +617,9 @@
615
  <div class="container-centered">
616
  <div class="row">
617
  <div class="col-md-10 col-md-offset-1">
618
- <h3 id="Demo">
619
  Demo:
620
- </h3>
621
  <div class="text-justify">
622
  We present a few jailbreak examples of the performance of our trained DPPs under both LLAMA-2-7B-Chat and MISTRAL-7B-Instruct-v0.2 models. <span class="red-text">Note that some of the response contents contain harmful information.</span>
623
  </div>
@@ -704,13 +706,32 @@
704
  </div>
705
  </div>
706
  </section>
707
-
708
  <section class="section">
709
  <div class="container is-max-desktop">
710
  <div class="columns is-centered">
711
  <div class="container-centered">
712
- <h2 class="title is-3">Abstract</h2>
713
  <div class="content has-text-justified">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
714
  </div>
715
  </div>
716
  </div>
 
42
  <link rel="stylesheet" href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
43
  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
44
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.min.js"></script>
45
+ <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
46
+ <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
47
  <script>
48
  $( function() {
49
  $( "#tabs" ).tabs();
 
617
  <div class="container-centered">
618
  <div class="row">
619
  <div class="col-md-10 col-md-offset-1">
620
+ <h2 id="Demo">
621
  Demo:
622
+ </h2>
623
  <div class="text-justify">
624
  We present a few jailbreak examples of the performance of our trained DPPs under both LLAMA-2-7B-Chat and MISTRAL-7B-Instruct-v0.2 models. <span class="red-text">Note that some of the response contents contain harmful information.</span>
625
  </div>
 
706
  </div>
707
  </div>
708
  </section>
709
+ <!-- Results -->
710
  <section class="section">
711
  <div class="container is-max-desktop">
712
  <div class="columns is-centered">
713
  <div class="container-centered">
714
+ <h2 class="title is-3">Results</h2>
715
  <div class="content has-text-justified">
716
+ <p>In this section we want to show our <strong>numerical results</strong> as well as <strong>our trained DPP</strong> on both LLAMA-2-Chat
717
+ and MISTRAL-7B-Instruct-v0.2.</p>
718
+ <h2>Evaluation Metrics:</h2>
719
+ <ul>
720
+ <li><strong>Attack Success Rate:</strong>We use the Attack Success Rate (ASR) as our primary metric for evaluating the effectiveness of jailbreak defenses.
721
+ The ASR measures the proportion of malicious queries that successfully bypass the LLMs alignment and generate harmful responses.</li>
722
+ <p><b>ASR</b> is defined as:</p>
723
+ <p>\[
724
+ \textbf{ASR} = \frac{\text{Number\_of\_jailbreak\_queries}}{\text{Total\_queries}}
725
+ \]</p>
726
+ <p>Here the \(\text{Number\_of\_jailbreak\_queries}\) is calculated through the sub-strings matching. Specifically, for a given generated response of a jailbreak query, if the response contains sub-strings that exist in the pre-defined sub-string set \(S\). Then, it will be evaluated as <b>jailbroken</b>, otherwise it is <b>non-jailbroken</b>.</p>
727
+ <p>The function to determine if a response is jailbroken can be expressed as:</p>
728
+ <p>\[
729
+ \text{JailBroken}(\text{response}) = \begin{cases}
730
+ 1, & \text{if response contains any keyword;} \\
731
+ 0, & \text{otherwise.}
732
+ \end{cases}
733
+ \]</p>
734
+ </ul>
735
  </div>
736
  </div>
737
  </div>