or-bench-leaderboard / index.html
Justin Cui
initial commit.
f98e8dd
<!DOCTYPE html>
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-178132094-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag("js", new Date());
gtag("config", "UA-178132094-1");
</script>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!-- <meta name="viewport" content="width=1024" /> -->
<title>OR-Bench: Over Refusal Benchmark</title>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@400;700&display=swap" rel="stylesheet">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/foundation/6.4.3/css/foundation.min.css" />
<link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css" />
<script src="https://kit.fontawesome.com/b939870cfb.js" crossorigin="anonymous"></script>
<link rel="stylesheet" href="https://cdn.datatables.net/1.10.24/css/dataTables.foundation.min.css">
<script type="text/javascript" src="https://cdn.datatables.net/1.10.24/js/jquery.dataTables.min.js"></script>
<link rel="stylesheet" href="./css/main.css" />
</head>
<body>
<nav class="navbar navbar-expand-md">
<div class="container">
<a class="navbar-brand" href="./index.html"
>OR-Bench</a>
<button
class="navbar-toggler navbar-light"
type="button"
data-toggle="collapse"
data-target="#main-navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="main-navigation">
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link" href="#leaderboard">Leaderboards</a>
</li>
<li>
<a class="nav-link" href="https://arxiv.org/abs/2405.20947" target="_blank">Paper</a>
</li>
<li>
<a class="nav-link text-nowrap" href="https://github.com/justincui03/or-bench"
target="_blank">Github</a>
</li>
</ul>
</div>
</div>
</nav>
<!-- <hr class="toprule" /> -->
<header>
<div class="header-block container">
<div class="title-logo"><img src="./images/logo.png" alt="logo" /></div>
<div class="title">OR-BENCH</div>
<div class="description">
An over-refusal benchmark for large language models
</div>
</div>
</header>
<!-- <hr class="toprule" /> -->
<div class="container">
<section id="introduction">
<div class="overview">
<p class="doublealign">
<b>Large Language Models (LLMs) </b> require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation,
the enhanced safety often come with the side effect of over-refusal, where LLMs may reject innocuous prompts and become less helpful.
Although the issue of over-refusal has been empirically observed, a systematic measurement is challenging
due to the difficulty of crafting prompts that appear harmful but are benign.<br><br>
We introduce OR-Bench, the <b>first large-scale over-refusal benchmark</b>. OR-Bench comprises 80,000 seemingly toxic prompts across 10 common rejection categories, a subset of around 1,000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses.<br><br>
We plot the evaluation results in the following figure. The x-axis is the rejection rate on seemingly toxic prompts and the y-axis is the rejection rate on real toxic prompts. In the ideal case, the model should be on the top-left corner where the model rejects the most number of toixc prompts and the least number of seemingly toxic prompts.
</p>
<div style="margin-top:20px"><img src="./images/overall_x_y_plot.png" style="width: 100%;"/></div>
</div>
</section>
<div class="divider"><hr /></div>
<section class="container" id="div_cifar10_ipc1_heading">
<div id="div_or_bench" class="display responsive nowrap" style="width:100%"></div>
</section>
<div class="divider"><hr /></div>
<!-- <script
type="module"
src="https://gradio.s3-us-west-2.amazonaws.com/4.31.0/gradio.js"
></script> -->
<div><b>Please try out our demos below πŸš€</b></div>
<div class="iframe-container">
<iframe
id="myIframe"
src="https://bench-llm-or-bench.hf.space"
frameborder="0"
width="2160"
height="450"
></iframe>
</div>
<div class="vspace50"></div>
<div id="new_result" style="width:100%">
<a class="btn btn-secondary" style="width:100%" href="https://forms.gle/WEeSVBENFWMXMuYK8" target="_blank">Submit New Results <img src="images/click.png" width="20"/></a>
</div>
<section id="citation" >
<div class="heading">
<p>Citation</p>
</div>
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #555555; font-weight: bold">@article</span>{cui2024or,
title<span style="color: #333333">=</span>{OR<span style="color: #333333">-</span>Bench: An Over<span style="color: #333333">-</span>Refusal Benchmark <span style="color: #008800; font-weight: bold">for</span> Large Language Models},
author<span style="color: #333333">=</span>{Cui, Justin <span style="color: #000000; font-weight: bold">and</span> Chiang, Wei<span style="color: #333333">-</span>Lin <span style="color: #000000; font-weight: bold">and</span> Stoica, Ion <span style="color: #000000; font-weight: bold">and</span> Hsieh, Cho<span style="color: #333333">-</span>Jui},
journal<span style="color: #333333">=</span>{arXiv preprint arXiv:<span style="color: #6600EE; font-weight: bold">2405.20947</span>},
year<span style="color: #333333">=</span>{<span style="color: #0000DD; font-weight: bold">2024</span>}
}
</pre></div>
</section>
<div class="vspace50"></div>
</div>
<hr class="bottomrule" />
<footer>
<small>&copy; 2024, OR-Bench
</footer>
<script>
// When the user scrolls the page, execute myFunction
window.onscroll = function () {
myFunction();
};
// Get the navbar
var navbar = document.getElementById("navbar");
// Get the offset position of the navbar
var sticky = navbar.offsetTop;
// Add the sticky class to the navbar when you reach its scroll position. Remove "sticky" when you leave the scroll position
function myFunction() {
if (window.pageYOffset >= sticky) {
navbar.classList.add("sticky");
} else {
navbar.classList.remove("sticky");
}
}
</script>
<script>
$("#div_or_bench").load("./data/or-bench.html", function() {
$('#or-bench-table').DataTable({
"pageLength": 25, // Set the initial number of entries
"lengthMenu": [[10, 25, 50, -1], [10, 25, 50, "All"]], // Set options for lengthMenu
"order": [[3, "asc"]], // Sort by the third column (index 2) in descending order
"paging": false, // Disables pagination
"responsive": true // Enable responsive feature
});
});
</script>
</body>