Spaces:

bench-llms
/

or-bench-leaderboard

Running

Justin Cui

initial commit.

f98e8dd 5 months ago

8.15 kB

	<!DOCTYPE html>

	<head>
	<!-- Global site tag (gtag.js) - Google Analytics -->
	<script async src="https://www.googletagmanager.com/gtag/js?id=UA-178132094-1"></script>
	<script>
	window.dataLayer = window.dataLayer \|\| [];
	function gtag() {
	dataLayer.push(arguments);
	}
	gtag("js", new Date());

	gtag("config", "UA-178132094-1");
	</script>

	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1" />
	<!-- <meta name="viewport" content="width=1024" /> -->
	<title>OR-Bench: Over Refusal Benchmark</title>
	<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
	<link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@400;700&display=swap" rel="stylesheet">
	<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
	<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
	<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
	<script type="text/javascript" async
	src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
	</script>
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/foundation/6.4.3/css/foundation.min.css" />
	<link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css" />
	<script src="https://kit.fontawesome.com/b939870cfb.js" crossorigin="anonymous"></script>
	<link rel="stylesheet" href="https://cdn.datatables.net/1.10.24/css/dataTables.foundation.min.css">
	<script type="text/javascript" src="https://cdn.datatables.net/1.10.24/js/jquery.dataTables.min.js"></script>
	<link rel="stylesheet" href="./css/main.css" />
	</head>


	<body>
	<nav class="navbar navbar-expand-md">
	<div class="container">
	<a class="navbar-brand" href="./index.html"
	>OR-Bench</a>
	<button
	class="navbar-toggler navbar-light"
	type="button"
	data-toggle="collapse"
	data-target="#main-navigation"
	>
	<span class="navbar-toggler-icon"></span>
	</button>
	<div class="collapse navbar-collapse" id="main-navigation">
	<ul class="navbar-nav">
	<li class="nav-item">
	<a class="nav-link" href="#leaderboard">Leaderboards</a>
	</li>
	<li>
	<a class="nav-link" href="https://arxiv.org/abs/2405.20947" target="_blank">Paper</a>
	</li>
	<li>
	<a class="nav-link text-nowrap" href="https://github.com/justincui03/or-bench"
	target="_blank">Github</a>
	</li>
	</ul>
	</div>
	</div>
	</nav>


	<!-- <hr class="toprule" /> -->
	<header>
	<div class="header-block container">
	<div class="title-logo"><img src="./images/logo.png" alt="logo" /></div>
	<div class="title">OR-BENCH</div>
	<div class="description">
	An over-refusal benchmark for large language models
	</div>
	</div>
	</header>
	<!-- <hr class="toprule" /> -->

	<div class="container">
	<section id="introduction">
	<div class="overview">
	<p class="doublealign">
	<b>Large Language Models (LLMs) </b> require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation,
	the enhanced safety often come with the side effect of over-refusal, where LLMs may reject innocuous prompts and become less helpful.
	Although the issue of over-refusal has been empirically observed, a systematic measurement is challenging
	due to the difficulty of crafting prompts that appear harmful but are benign.<br><br>

	We introduce OR-Bench, the <b>first large-scale over-refusal benchmark</b>. OR-Bench comprises 80,000 seemingly toxic prompts across 10 common rejection categories, a subset of around 1,000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses.<br><br>

	We plot the evaluation results in the following figure. The x-axis is the rejection rate on seemingly toxic prompts and the y-axis is the rejection rate on real toxic prompts. In the ideal case, the model should be on the top-left corner where the model rejects the most number of toixc prompts and the least number of seemingly toxic prompts.
	</p>
	<div style="margin-top:20px"><img src="./images/overall_x_y_plot.png" style="width: 100%;"/></div>
	</div>
	</section>

	<div class="divider"><hr /></div>

	<section class="container" id="div_cifar10_ipc1_heading">
	<div id="div_or_bench" class="display responsive nowrap" style="width:100%"></div>
	</section>

	<div class="divider"><hr /></div>
	<!-- <script
	type="module"
	src="https://gradio.s3-us-west-2.amazonaws.com/4.31.0/gradio.js"
	></script> -->

	<div><b>Please try out our demos below 🚀</b></div>
	<div class="iframe-container">
	<iframe
	id="myIframe"
	src="https://bench-llm-or-bench.hf.space"
	frameborder="0"
	width="2160"
	height="450"
	></iframe>
	</div>
	<div class="vspace50"></div>

	<div id="new_result" style="width:100%">
	<a class="btn btn-secondary" style="width:100%" href="https://forms.gle/WEeSVBENFWMXMuYK8" target="_blank">Submit New Results <img src="images/click.png" width="20"/></a>
	</div>

	<section id="citation" >
	<div class="heading">
	<p>Citation</p>
	</div>
	<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #555555; font-weight: bold">@article</span>{cui2024or,
	title<span style="color: #333333">=</span>{OR<span style="color: #333333">-</span>Bench: An Over<span style="color: #333333">-</span>Refusal Benchmark <span style="color: #008800; font-weight: bold">for</span> Large Language Models},
	author<span style="color: #333333">=</span>{Cui, Justin <span style="color: #000000; font-weight: bold">and</span> Chiang, Wei<span style="color: #333333">-</span>Lin <span style="color: #000000; font-weight: bold">and</span> Stoica, Ion <span style="color: #000000; font-weight: bold">and</span> Hsieh, Cho<span style="color: #333333">-</span>Jui},
	journal<span style="color: #333333">=</span>{arXiv preprint arXiv:<span style="color: #6600EE; font-weight: bold">2405.20947</span>},
	year<span style="color: #333333">=</span>{<span style="color: #0000DD; font-weight: bold">2024</span>}
	}
	</pre></div>


	</section>

	<div class="vspace50"></div>

	</div>

	<hr class="bottomrule" />

	<footer>
	<small>© 2024, OR-Bench
	</footer>

	<script>
	// When the user scrolls the page, execute myFunction
	window.onscroll = function () {
	myFunction();
	};
	// Get the navbar
	var navbar = document.getElementById("navbar");
	// Get the offset position of the navbar
	var sticky = navbar.offsetTop;
	// Add the sticky class to the navbar when you reach its scroll position. Remove "sticky" when you leave the scroll position
	function myFunction() {
	if (window.pageYOffset >= sticky) {
	navbar.classList.add("sticky");
	} else {
	navbar.classList.remove("sticky");
	}
	}
	</script>
	<script>
	$("#div_or_bench").load("./data/or-bench.html", function() {
	$('#or-bench-table').DataTable({
	"pageLength": 25, // Set the initial number of entries
	"lengthMenu": [[10, 25, 50, -1], [10, 25, 50, "All"]], // Set options for lengthMenu
	"order": [[3, "asc"]], // Sort by the third column (index 2) in descending order
	"paging": false, // Disables pagination
	"responsive": true // Enable responsive feature
	});
	});
	</script>
	</body>