tybrs commited on
Commit
5a92a20
1 Parent(s): b9e00cb

Update Space (evaluate main: f805d6ec)

Browse files
Files changed (1) hide show
  1. index.html +35 -19
index.html CHANGED
@@ -1,19 +1,35 @@
1
- <!DOCTYPE html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 id="adversarial-glue-evaluation-suite">Adversarial GLUE Evaluation Suite</h1>
2
+ <h2 id="description">Description</h2>
3
+ <p>This evaluation suite compares the GLUE results with Adversarial GLUE (AdvGLUE), a multi-task benchmark that evaluates modern large-scale language models robustness with respect to various types of adversarial attacks.</p>
4
+ <h2 id="how-to-use">How to use</h2>
5
+ <p>This suite requires installations of the following fork <a href="https://github.com/IntelAI/evaluate/tree/develop">IntelAI/evaluate</a>.</p>
6
+ <p>After installation, there are two steps: (1) loading the Adversarial GLUE suite; and (2) calculating the metric.</p>
7
+ <ol type="1">
8
+ <li><strong>Loading the relevant GLUE metric</strong> : This suite loads an evaluation suite subtasks for the following tasks on both AdvGLUE and GLUE datasets: <code>sst2</code>, <code>mnli</code>, <code>qnli</code>, <code>rte</code>, and <code>qqp</code>.</li>
9
+ </ol>
10
+ <p>More information about the different subsets of the GLUE dataset can be found on the <a href="https://huggingface.co/datasets/glue">GLUE dataset page</a>.</p>
11
+ <ol start="2" type="1">
12
+ <li><strong>Calculating the metric</strong>: the metric takes one input: the name of the model or pipeline</li>
13
+ </ol>
14
+ <div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> evaluate <span class="im">import</span> EvaluationSuite</span>
15
+ <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
16
+ <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>suite <span class="op">=</span> EvaluationSuite.load(<span class="st">&#39;intel/adversarial_glue&#39;</span>)</span>
17
+ <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>mc_results, <span class="op">=</span> suite.run(<span class="st">&quot;gpt2&quot;</span>)</span></code></pre></div>
18
+ <h2 id="output-results">Output results</h2>
19
+ <p>The output of the metric depends on the GLUE subset chosen, consisting of a dictionary that contains one or several of the following metrics:</p>
20
+ <p><code>accuracy</code>: the proportion of correct predictions among the total number of cases processed, with a range between 0 and 1 (see <a href="https://huggingface.co/metrics/accuracy">accuracy</a> for more information).</p>
21
+ <h3 id="values-from-popular-papers">Values from popular papers</h3>
22
+ <p>The <a href="https://huggingface.co/datasets/glue">original GLUE paper</a> reported average scores ranging from 58% to 64%, depending on the model used (with all evaluation values scaled by 100 to make computing the average possible).</p>
23
+ <p>For more recent model performance, see the <a href="https://paperswithcode.com/dataset/glue">dataset leaderboard</a>.</p>
24
+ <h2 id="examples">Examples</h2>
25
+ <p>For full example see <a href="https://github.com/IntelAI/evaluate/blob/develop/notebooks/HF%20Evaluate%20Adversarial%20Attacks.ipynb">HF Evaluate Adversarial Attacks.ipynb</a></p>
26
+ <h2 id="limitations-and-bias">Limitations and bias</h2>
27
+ <p>This metric works only with datasets that have the same format as the <a href="https://huggingface.co/datasets/glue">GLUE dataset</a>.</p>
28
+ <p>While the GLUE dataset is meant to represent “General Language Understanding”, the tasks represented in it are not necessarily representative of language understanding, and should not be interpreted as such.</p>
29
+ <h2 id="citation">Citation</h2>
30
+ <div class="sourceCode" id="cb2"><pre class="sourceCode bibtex"><code class="sourceCode bibtex"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"> </span><span class="va">@inproceedings</span>{<span class="ot">wang2021adversarial</span>,</span>
31
+ <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">title</span>={Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models},</span>
32
+ <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">author</span>={Wang, Boxin and Xu, Chejian and Wang, Shuohang and Gan, Zhe and Cheng, Yu and Gao, Jianfeng and Awadallah, Ahmed Hassan and Li, Bo},</span>
33
+ <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">booktitle</span>={Advances in Neural Information Processing Systems},</span>
34
+ <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">year</span>={2021}</span>
35
+ <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>