Jay commited on
Commit
58e363c
1 Parent(s): 16a4f17

update text

Browse files
Files changed (1) hide show
  1. assets/text.py +3 -3
assets/text.py CHANGED
@@ -7,25 +7,25 @@ On this leaderboard, we share the evaluation results of LLMs obtained by develop
7
  # Dataset
8
  <span style="font-size:16px; font-family: 'Times New Roman', serif">
9
  To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
10
- on the content safety of large language models for Chinese (Mandarin).
11
  To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
12
  across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
13
  and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
14
  We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
15
 
16
  The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
17
- We will release the technical report in the near future.
18
  Comments, issues, contributions, and collaborations are all welcomed!
19
  Email: weihx@sustech.edu.cn
20
  </span>
21
  """ # noqa
 
22
 
23
  METRICS_TEXT = """
24
  # Metrics
25
  <span style="font-size:16px; font-family: 'Times New Roman', serif">
26
  We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
27
  In particular, the results are shown as <b>metric/std</b> format in the table,
28
- where <b>std</b> indicates the standard deviation of the results obtained from different random seeds.
29
  </span>
30
  """ # noqa
31
 
 
7
  # Dataset
8
  <span style="font-size:16px; font-family: 'Times New Roman', serif">
9
  To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
10
+ on the content safety of LLMs for Chinese (Mandarin).
11
  To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
12
  across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
13
  and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
14
  We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
15
 
16
  The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
 
17
  Comments, issues, contributions, and collaborations are all welcomed!
18
  Email: weihx@sustech.edu.cn
19
  </span>
20
  """ # noqa
21
+ # We will release the technical report in the near future.
22
 
23
  METRICS_TEXT = """
24
  # Metrics
25
  <span style="font-size:16px; font-family: 'Times New Roman', serif">
26
  We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
27
  In particular, the results are shown as <b>metric/std</b> format in the table,
28
+ where <b>std</b> indicates the standard deviation of the results with various random seeds.
29
  </span>
30
  """ # noqa
31