MCILAB commited on
Commit
d1beffc
·
verified ·
1 Parent(s): 90a84e7

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +17 -17
src/about.py CHANGED
@@ -42,25 +42,25 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
42
  3. Naturally collected data (reflecting indigenous cultural nuances)
43
 
44
  ### Key Datasets in the Benchmark
45
- The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
- **Translated Datasets**
47
- • Anthropic-fa
48
- • AdvBench-fa
49
- • HarmBench-fa
50
- • DecodingTrust-fa
51
- **Newly Developed Persian Datasets**
52
- • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
53
- • SafeBench-fa: Assesses safety in generated outputs.
54
- • FairBench-fa: Measures bias mitigation in Persian LLMs.
55
- • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
56
- **Naturally Collected Persian Dataset**
57
- • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
58
 
59
  ### A Unified Framework for Persian LLM Evaluation
60
- By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
61
- • Safety: Avoiding harmful or toxic content.
62
- • Fairness: Mitigating biases in model outputs.
63
- • Social Norms: Ensuring culturally appropriate behavior.
64
 
65
 
66
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
 
42
  3. Naturally collected data (reflecting indigenous cultural nuances)
43
 
44
  ### Key Datasets in the Benchmark
45
+ > The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
+ > **Translated Datasets**
47
+ > • Anthropic-fa
48
+ > • AdvBench-fa
49
+ > • HarmBench-fa
50
+ > • DecodingTrust-fa
51
+ > **Newly Developed Persian Datasets**
52
+ > • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
53
+ > • SafeBench-fa: Assesses safety in generated outputs.
54
+ > • FairBench-fa: Measures bias mitigation in Persian LLMs.
55
+ > • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
56
+ > **Naturally Collected Persian Dataset**
57
+ > • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
58
 
59
  ### A Unified Framework for Persian LLM Evaluation
60
+ > By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
61
+ >**Safety**: Avoiding harmful or toxic content.
62
+ >**Fairness**: Mitigating biases in model outputs.
63
+ >**Social Norms**: Ensuring culturally appropriate behavior.
64
 
65
 
66
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.