MCILAB commited on
Commit
046d108
·
verified ·
1 Parent(s): d1beffc

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +9 -6
src/about.py CHANGED
@@ -41,26 +41,29 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
41
  2. Synthetically generated data (newly created for Persian LLMs)
42
  3. Naturally collected data (reflecting indigenous cultural nuances)
43
 
44
- ### Key Datasets in the Benchmark
45
  > The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
 
46
  > **Translated Datasets**
47
  > • Anthropic-fa
48
  > • AdvBench-fa
49
- > • HarmBench-fa
50
  > • DecodingTrust-fa
 
51
  > **Newly Developed Persian Datasets**
52
  > • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
53
  > • SafeBench-fa: Assesses safety in generated outputs.
54
  > • FairBench-fa: Measures bias mitigation in Persian LLMs.
55
  > • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
 
56
  > **Naturally Collected Persian Dataset**
57
  > • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
58
 
59
  ### A Unified Framework for Persian LLM Evaluation
60
- > By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
61
- > • **Safety**: Avoiding harmful or toxic content.
62
- > • **Fairness**: Mitigating biases in model outputs.
63
- > • **Social Norms**: Ensuring culturally appropriate behavior.
64
 
65
 
66
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
 
41
  2. Synthetically generated data (newly created for Persian LLMs)
42
  3. Naturally collected data (reflecting indigenous cultural nuances)
43
 
44
+ ## Key Datasets in the Benchmark
45
  > The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
+ >
47
  > **Translated Datasets**
48
  > • Anthropic-fa
49
  > • AdvBench-fa
50
+ > • HarmBench-fa
51
  > • DecodingTrust-fa
52
+ >
53
  > **Newly Developed Persian Datasets**
54
  > • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
55
  > • SafeBench-fa: Assesses safety in generated outputs.
56
  > • FairBench-fa: Measures bias mitigation in Persian LLMs.
57
  > • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
58
+ >
59
  > **Naturally Collected Persian Dataset**
60
  > • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
61
 
62
  ### A Unified Framework for Persian LLM Evaluation
63
+ By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
64
+ • **Safety**: Avoiding harmful or toxic content.
65
+ • **Fairness**: Mitigating biases in model outputs.
66
+ • **Social Norms**: Ensuring culturally appropriate behavior.
67
 
68
 
69
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.