Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
danielz02
commited on
Change about information
Browse files- src/display/about.py +19 -10
src/display/about.py
CHANGED
@@ -26,22 +26,25 @@ class Tasks(Enum):
|
|
26 |
TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
|
27 |
|
28 |
# What does your leaderboard evaluate?
|
29 |
-
INTRODUCTION_TEXT = """
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
|
34 |
# Which evaluations are you running? how can people reproduce what you have?
|
35 |
LLM_BENCHMARKS_TEXT = f"""
|
36 |
-
##
|
37 |
-
|
38 |
-
|
39 |
|
40 |
-
|
41 |
-
limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs).
|
42 |
|
43 |
-
This
|
|
|
|
|
44 |
|
|
|
|
|
45 |
+ Toxicity
|
46 |
+ Stereotype and bias
|
47 |
+ Adversarial robustness
|
@@ -51,6 +54,12 @@ This project is organized around the following eight primary perspectives of tru
|
|
51 |
+ Machine Ethics
|
52 |
+ Fairness
|
53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
We normalize the score of each perspective as 0-100, and these scores are the higher the better.
|
55 |
|
56 |
## Reproducibility
|
|
|
26 |
TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
|
27 |
|
28 |
# What does your leaderboard evaluate?
|
29 |
+
INTRODUCTION_TEXT = """The Safe LLM Leaderboard aims to provide a unified evaluation for LLM safety and help
|
30 |
+
researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs. Submit a
|
31 |
+
model for evaluation on the “Submit” page! The leaderboard is generated based on the trustworthiness evaluation
|
32 |
+
platform [DecodingTrust](https://decodingtrust.github.io/)."""
|
33 |
|
34 |
# Which evaluations are you running? how can people reproduce what you have?
|
35 |
LLM_BENCHMARKS_TEXT = f"""
|
36 |
+
## Context As LLMs have demonstrated impressive capabilities and are being deployed in
|
37 |
+
high-stakes domains such as healthcare, transportation, and finance, understanding the safety, limitations,
|
38 |
+
and potential risks of LLMs is crucial.
|
39 |
|
40 |
+
## How it works
|
|
|
41 |
|
42 |
+
This leaderboard is powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness
|
43 |
+
evaluation for LLMs. More details about the paper, which has won the Outstanding Paper award at NeurIPs’23,
|
44 |
+
and the platform can be found here.
|
45 |
|
46 |
+
DecodingTrust aims to provide comprehensive risk and trustworthiness assessment for LLMs. Currently, it includes the
|
47 |
+
following eight primary perspectives of trustworthiness, including:
|
48 |
+ Toxicity
|
49 |
+ Stereotype and bias
|
50 |
+ Adversarial robustness
|
|
|
54 |
+ Machine Ethics
|
55 |
+ Fairness
|
56 |
|
57 |
+
We normalize the evaluation score of each perspective between 0-100, which means the higher the better.
|
58 |
+
|
59 |
+
Examples of these vulnerabilities are shown below.
|
60 |
+

|
61 |
+
|
62 |
+
|
63 |
We normalize the score of each perspective as 0-100, and these scores are the higher the better.
|
64 |
|
65 |
## Reproducibility
|