danielz02 commited on
Commit
8e98a05
1 Parent(s): 6dbf9c6

Change about information

Browse files
Files changed (1) hide show
  1. src/display/about.py +19 -10
src/display/about.py CHANGED
@@ -26,22 +26,25 @@ class Tasks(Enum):
26
  TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
27
 
28
  # What does your leaderboard evaluate?
29
- INTRODUCTION_TEXT = """Powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness
30
- evaluation for LLMs, this leaderboard is designed to help researchers and practitioners better understand the
31
- capabilities, limitations, and potential risks of state-of-the-art Large Language Models (LLMs). See our paper for
32
- details. Access the DecodingTrust platform website [here](https://decodingtrust.github.io/)."""
33
 
34
  # Which evaluations are you running? how can people reproduce what you have?
35
  LLM_BENCHMARKS_TEXT = f"""
36
- ## How it works
37
-
38
- DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models.
39
 
40
- This research endeavor is designed to help researchers and practitioners better understand the capabilities,
41
- limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs).
42
 
43
- This project is organized around the following eight primary perspectives of trustworthiness, including:
 
 
44
 
 
 
45
  + Toxicity
46
  + Stereotype and bias
47
  + Adversarial robustness
@@ -51,6 +54,12 @@ This project is organized around the following eight primary perspectives of tru
51
  + Machine Ethics
52
  + Fairness
53
 
 
 
 
 
 
 
54
  We normalize the score of each perspective as 0-100, and these scores are the higher the better.
55
 
56
  ## Reproducibility
 
26
  TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
27
 
28
  # What does your leaderboard evaluate?
29
+ INTRODUCTION_TEXT = """The Safe LLM Leaderboard aims to provide a unified evaluation for LLM safety and help
30
+ researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs. Submit a
31
+ model for evaluation on the “Submit” page! The leaderboard is generated based on the trustworthiness evaluation
32
+ platform [DecodingTrust](https://decodingtrust.github.io/)."""
33
 
34
  # Which evaluations are you running? how can people reproduce what you have?
35
  LLM_BENCHMARKS_TEXT = f"""
36
+ ## Context As LLMs have demonstrated impressive capabilities and are being deployed in
37
+ high-stakes domains such as healthcare, transportation, and finance, understanding the safety, limitations,
38
+ and potential risks of LLMs is crucial.
39
 
40
+ ## How it works
 
41
 
42
+ This leaderboard is powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness
43
+ evaluation for LLMs. More details about the paper, which has won the Outstanding Paper award at NeurIPs’23,
44
+ and the platform can be found here.
45
 
46
+ DecodingTrust aims to provide comprehensive risk and trustworthiness assessment for LLMs. Currently, it includes the
47
+ following eight primary perspectives of trustworthiness, including:
48
  + Toxicity
49
  + Stereotype and bias
50
  + Adversarial robustness
 
54
  + Machine Ethics
55
  + Fairness
56
 
57
+ We normalize the evaluation score of each perspective between 0-100, which means the higher the better.
58
+
59
+ Examples of these vulnerabilities are shown below.
60
+ ![DecodingTrust Overview](https://decodingtrust.github.io/overview.png)
61
+
62
+
63
  We normalize the score of each perspective as 0-100, and these scores are the higher the better.
64
 
65
  ## Reproducibility