Leaderboard / evaluation.py
01Yassine's picture
Update evaluation.py
14b898e verified
import gradio as gr
def render_eval_info():
text = r"""
The Iqra’Eval challenge provides a shared, transparent platform to benchmark phoneme‐prediction systems on our open testset (“IqraEval/open_testset”).
**Submission Details**
– Submit a UTF‑8 CSV named **teamName_submission.csv** with exactly two columns:
1. **ID**: utterance identifier (e.g. “0000_0001”)
2. **Labels**: your predicted phoneme sequence (space‑separated)
```csv
ID,Labels
0000_0001,i n n a m a a y a …
0000_0002,m a a n a n s a …
```
**Evaluation Criteria**
– Leaderboard ranking is based on phoneme‑level **F1‑score**, computed via a two‑stage (detection + diagnostic) hierarchy:
1. **Detection (error vs. correct)**
- **TR (True Rejects)**: mispronounced phonemes correctly flagged
- **FA (False Accepts)**: mispronunciations missed
- **FR (False Rejects)**: correct phonemes wrongly flagged
- **TA (True Accepts)**: correct phonemes correctly passed
**Metrics:**
- **Precision** = `TR / (TR + FR)`
- **Recall** = `TR / (TR + FA)`
- **F1** = `2 · Precision · Recall / (Precision + Recall)`
2. **Diagnostic (substitution/deletion/insertion errors)**
See the **Metrics** tab for breakdown into:
- **DER**: Deletion Error Rate
- **IER**: Insertion Error Rate
- **SER**: Substitution Error Rate
– Once we receive your file (email: **iqraeval-submissions@googlegroups.com**), your submission is auto‑evaluated and placed on the leaderboard.
"""
return gr.Markdown(text, latex_delimiters=[{ "left": "$", "right": "$", "display": True }])