Update src/about.py
Browse files- src/about.py +17 -17
src/about.py
CHANGED
@@ -42,25 +42,25 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
|
|
42 |
3. Naturally collected data (reflecting indigenous cultural nuances)
|
43 |
|
44 |
### Key Datasets in the Benchmark
|
45 |
-
The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
46 |
-
**Translated Datasets**
|
47 |
-
• Anthropic-fa
|
48 |
-
• AdvBench-fa
|
49 |
-
|
50 |
-
• DecodingTrust-fa
|
51 |
-
**Newly Developed Persian Datasets**
|
52 |
-
• ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
53 |
-
• SafeBench-fa: Assesses safety in generated outputs.
|
54 |
-
• FairBench-fa: Measures bias mitigation in Persian LLMs.
|
55 |
-
• SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
56 |
-
**Naturally Collected Persian Dataset**
|
57 |
-
• GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
58 |
|
59 |
### A Unified Framework for Persian LLM Evaluation
|
60 |
-
By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|
61 |
-
• Safety
|
62 |
-
• Fairness
|
63 |
-
• Social Norms
|
64 |
|
65 |
|
66 |
This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
|
|
|
42 |
3. Naturally collected data (reflecting indigenous cultural nuances)
|
43 |
|
44 |
### Key Datasets in the Benchmark
|
45 |
+
> The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
46 |
+
> **Translated Datasets**
|
47 |
+
> • Anthropic-fa
|
48 |
+
> • AdvBench-fa
|
49 |
+
> • HarmBench-fa
|
50 |
+
> • DecodingTrust-fa
|
51 |
+
> **Newly Developed Persian Datasets**
|
52 |
+
> • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
53 |
+
> • SafeBench-fa: Assesses safety in generated outputs.
|
54 |
+
> • FairBench-fa: Measures bias mitigation in Persian LLMs.
|
55 |
+
> • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
56 |
+
> **Naturally Collected Persian Dataset**
|
57 |
+
> • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
58 |
|
59 |
### A Unified Framework for Persian LLM Evaluation
|
60 |
+
> By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|
61 |
+
> • **Safety**: Avoiding harmful or toxic content.
|
62 |
+
> • **Fairness**: Mitigating biases in model outputs.
|
63 |
+
> • **Social Norms**: Ensuring culturally appropriate behavior.
|
64 |
|
65 |
|
66 |
This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
|