Spaces:

AI-Secure
/

llm-trustworthy-leaderboard

Running on CPU Upgrade

App Files Files Community

danielz02 commited on Jan 18, 2024

Commit

8e98a05

unverified ·

1 Parent(s): 6dbf9c6

Change about information

Browse files

Files changed (1) hide show

src/display/about.py +19 -10

src/display/about.py CHANGED Viewed

@@ -26,22 +26,25 @@ class Tasks(Enum):
 TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
 # What does your leaderboard evaluate?
-INTRODUCTION_TEXT = """Powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness
-evaluation for LLMs, this leaderboard is designed to help researchers and practitioners better understand the
-capabilities, limitations, and potential risks of state-of-the-art Large Language Models (LLMs). See our paper for
-details. Access the DecodingTrust platform website [here](https://decodingtrust.github.io/)."""
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
-## How it works
-DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models.
-This research endeavor is designed to help researchers and practitioners better understand the capabilities,
-limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs).
-This project is organized around the following eight primary perspectives of trustworthiness, including:
 + Toxicity
 + Stereotype and bias
 + Adversarial robustness
@@ -51,6 +54,12 @@ This project is organized around the following eight primary perspectives of tru
 + Machine Ethics
 + Fairness
 We normalize the score of each perspective as 0-100, and these scores are the higher the better.
 ## Reproducibility

 TITLE = """<h1 align="center" id="space-title">Safe LLM leaderboard</h1>"""
 # What does your leaderboard evaluate?
+INTRODUCTION_TEXT = """The Safe LLM Leaderboard aims to provide a unified evaluation for LLM safety and help
+researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs. Submit a
+model for evaluation on the “Submit” page! The leaderboard is generated based on the trustworthiness evaluation
+platform [DecodingTrust](https://decodingtrust.github.io/)."""
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
+## Context As LLMs have demonstrated impressive capabilities and are being deployed in
+high-stakes domains such as healthcare, transportation, and finance, understanding the safety, limitations,
+and potential risks of LLMs is crucial.
+## How it works
+This leaderboard is powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness
+evaluation for LLMs. More details about the paper, which has won the Outstanding Paper award at NeurIPs’23,
+and the platform can be found here.
+DecodingTrust aims to provide comprehensive risk and trustworthiness assessment for LLMs. Currently, it includes the
+following eight primary perspectives of trustworthiness, including:
 + Toxicity
 + Stereotype and bias
 + Adversarial robustness
 + Machine Ethics
 + Fairness
+We normalize the evaluation score of each perspective between 0-100, which means the higher the better.
+Examples of these vulnerabilities are shown below.
+![DecodingTrust Overview](https://decodingtrust.github.io/overview.png)
 We normalize the score of each perspective as 0-100, and these scores are the higher the better.
 ## Reproducibility