Spaces:
Running
Running
Jay
commited on
Commit
•
58e363c
1
Parent(s):
16a4f17
update text
Browse files- assets/text.py +3 -3
assets/text.py
CHANGED
@@ -7,25 +7,25 @@ On this leaderboard, we share the evaluation results of LLMs obtained by develop
|
|
7 |
# Dataset
|
8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
9 |
To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
|
10 |
-
on the content safety of
|
11 |
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
|
12 |
across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
|
13 |
and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
|
14 |
We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
|
15 |
|
16 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
17 |
-
We will release the technical report in the near future.
|
18 |
Comments, issues, contributions, and collaborations are all welcomed!
|
19 |
Email: weihx@sustech.edu.cn
|
20 |
</span>
|
21 |
""" # noqa
|
|
|
22 |
|
23 |
METRICS_TEXT = """
|
24 |
# Metrics
|
25 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
26 |
We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
|
27 |
In particular, the results are shown as <b>metric/std</b> format in the table,
|
28 |
-
where <b>std</b> indicates the standard deviation of the results
|
29 |
</span>
|
30 |
""" # noqa
|
31 |
|
|
|
7 |
# Dataset
|
8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
9 |
To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
|
10 |
+
on the content safety of LLMs for Chinese (Mandarin).
|
11 |
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
|
12 |
across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
|
13 |
and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
|
14 |
We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
|
15 |
|
16 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
|
|
17 |
Comments, issues, contributions, and collaborations are all welcomed!
|
18 |
Email: weihx@sustech.edu.cn
|
19 |
</span>
|
20 |
""" # noqa
|
21 |
+
# We will release the technical report in the near future.
|
22 |
|
23 |
METRICS_TEXT = """
|
24 |
# Metrics
|
25 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
26 |
We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
|
27 |
In particular, the results are shown as <b>metric/std</b> format in the table,
|
28 |
+
where <b>std</b> indicates the standard deviation of the results with various random seeds.
|
29 |
</span>
|
30 |
""" # noqa
|
31 |
|