Spaces:

asr-africa
/

Automatic_Speech_Recognition_for_African_Languages

Running

App Files Files Community

Beijuka commited on Oct 1

Commit

bd0de02

verified ·

1 Parent(s): 03449f2

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +110 -28

src/streamlit_app.py CHANGED Viewed

@@ -321,17 +321,21 @@ with tab7:
     - **Accuracy (1–5 scale):** How correctly the model transcribed the audio.
     - **Meaning Preservation (1–5 scale):** Whether the transcription retained the original meaning.
     - **Orthography:** Whether the transcription followed standard writing conventions, including accents, diacritics, and special characters.
-    - **Error Types:** Evaluators identified common error categories, such as:
-        - Substitutions (wrong words used)
-        - Omissions (missing words)
-        - Insertions (extra words added)
-        - Pronunciation-related errors
-        - Diacritic/Tone/Special character errors
-        - Named Entity errors (people, places, currencies)
-        - Punctuation errors
-    - **Performance Description:** Free text where evaluators described strengths and weaknesses of the models.
     """)
     # --- Setup ---
     st.subheader("Evaluation Setup")
     st.write("""
@@ -339,41 +343,119 @@ with tab7:
     - **Participants:** 20 evaluators (native speakers or fluent linguists), aged 18–50, majority with postgraduate education.
     - **Platform:** A Gradio-based interface allowed evaluators to upload/record audio, view transcriptions, and complete the feedback form directly online.
     """)
     # --- Findings ---
     st.subheader("Findings")
     st.write("""
     - **High-Performing Languages:**
-        - Swahili (Accuracy 4.96, Meaning 4.97)
-        - Luganda (Accuracy 4.70, Meaning 4.78)
-        - Amharic (Accuracy 4.65, Meaning 4.82)
-        These models produced highly accurate transcriptions with minimal meaning loss.
     - **Moderate Performance:**
-      Hausa, Oromo, Bemba, Yoruba, and Wolof — generally understandable, but often with orthography and punctuation issues.
-    - **Low-Performing Languages:**
-        - Igbo (Accuracy 2.25, Meaning 2.15)
-        - Afrikaans (Accuracy 3.59, Meaning 4.10)
-        - Xhosa (Accuracy 3.62, Meaning 3.38)
-        These suffered from limited training data, frequent substitution/omission errors, and poor handling of named entities.
     """)
     # --- Error Patterns ---
     st.subheader("Common Error Patterns")
     st.write("""
-    1. Punctuation and formatting inconsistencies.
-    2. Word merging or spacing errors, especially in morphologically rich languages.
-    3. Named entity recognition failures (numbers, currencies, names).
-    4. Spelling and orthography deviations, especially in languages with tones/diacritics.
     """)
     # --- Takeaways ---
     st.subheader("Takeaways")
     st.write("""
     - Human ratings generally aligned with automatic metrics: languages with larger datasets (Swahili, Luganda, Amharic) scored highest.
-    - Language models (LMs) were most effective in **low-data regimes (<50 hours)**, improving readability and accuracy.
     - WER alone misses issues such as meaning drift, orthography violations, and named entity errors.
-    - More curated, domain-diverse training data is needed for low-performing languages such as Igbo and Afrikaans.
-    - Human evaluation remains essential for **user-facing ASR systems**, where usability depends on meaning preservation and fluency, not just raw error rates.
     """)

     - **Accuracy (1–5 scale):** How correctly the model transcribed the audio.
     - **Meaning Preservation (1–5 scale):** Whether the transcription retained the original meaning.
     - **Orthography:** Whether the transcription followed standard writing conventions, including accents, diacritics, and special characters.
+    - **Recording Environment:** Evaluators noted the type of environment (quiet, professional studio, or noisy background) since background noise impacts ASR performance.
+    - **Device Used:** Information on whether the recording was made with a mobile phone, laptop microphone, or dedicated mic, as device quality affects clarity.
+    - **Domain/Topic of Speech:** Evaluators indicated if the speech belonged to a specific topic such as education, health, law, or everyday conversation, to assess domain adaptability.
+    - **Error Types:** Evaluators identified common error categories, such as:
+      - Substitutions (wrong words used)
+      - Omissions (missing words)
+      - Insertions (extra words added)
+      - Pronunciation-related errors
+      - Diacritic/Tone/Special character errors
+      - Named Entity errors (people, places, currencies)
+      - Punctuation errors
+    - **Performance Description:** Free text where evaluators described strengths and weaknesses of the models in their own words.
     """)
     # --- Setup ---
     st.subheader("Evaluation Setup")
     st.write("""
     - **Participants:** 20 evaluators (native speakers or fluent linguists), aged 18–50, majority with postgraduate education.
     - **Platform:** A Gradio-based interface allowed evaluators to upload/record audio, view transcriptions, and complete the feedback form directly online.
     """)
+    st.subheader("Evaluator Contributions")
+    data = [
+        {"Evaluator ID": "eval_001", "Contributions": 65, "Languages": "Afrikaans"},
+        {"Evaluator ID": "eval_002", "Contributions": 50, "Languages": "Afrikaans"},
+        {"Evaluator ID": "eval_005", "Contributions": 63, "Languages": "Amharic"},
+        {"Evaluator ID": "eval_006", "Contributions": 69, "Languages": "Amharic"},
+        {"Evaluator ID": "eval_007", "Contributions": 50, "Languages": "Bemba"},
+        {"Evaluator ID": "eval_008", "Contributions": 53, "Languages": "Bemba"},
+        {"Evaluator ID": "eval_009", "Contributions": 60, "Languages": "Hausa"},
+        {"Evaluator ID": "eval_010", "Contributions": 53, "Languages": "Igbo"},
+        {"Evaluator ID": "eval_011", "Contributions": 12, "Languages": "Lingala"},
+        {"Evaluator ID": "eval_012", "Contributions": 115, "Languages": "Oromo"},
+        {"Evaluator ID": "eval_014", "Contributions": 52, "Languages": "Wolof"},
+        {"Evaluator ID": "eval_015", "Contributions": 8, "Languages": "Xhosa"},
+        {"Evaluator ID": "eval_017", "Contributions": 59, "Languages": "Yoruba"},
+        {"Evaluator ID": "eval_018", "Contributions": 58, "Languages": "Yoruba"},
+        {"Evaluator ID": "eval_019", "Contributions": 52, "Languages": "Luganda"},
+        {"Evaluator ID": "eval_020", "Contributions": 55, "Languages": "Luganda"},
+        {"Evaluator ID": "eval_021", "Contributions": 66, "Languages": "Swahili"},
+        {"Evaluator ID": "eval_022", "Contributions": 64, "Languages": "Swahili"},
+        {"Evaluator ID": "eval_023", "Contributions": 50, "Languages": "Kinyarwanda"},
+        {"Evaluator ID": "eval_024", "Contributions": 53, "Languages": "Kinyarwanda"},
+    ]
+    df_evaluators = pd.DataFrame(data)
+    st.dataframe(df_evaluators, width="stretch")
+    # Optional: also show totals
+    st.write("### Summary")
+    st.write(f"- **Total Evaluators:** {df_evaluators['Evaluator ID'].nunique()}")
+    st.write(f"- **Total Contributions:** {df_evaluators['Contributions'].sum()}")
     # --- Findings ---
     st.subheader("Findings")
     st.write("""
+    ASR performance varied significantly across languages, reflecting differences in data availability,
+    orthography complexity, and domain coverage. Below we summarize the average **Accuracy** and
+    **Meaning Preservation** scores (1–5 scale) by language.
+    """)
+    # Data table of results
+    results_data = [
+        {"Language": "Swahili", "Audios Evaluated": 132, "Accuracy": 4.96, "Meaning": 4.97},
+        {"Language": "Luganda", "Audios Evaluated": 110, "Accuracy": 4.70, "Meaning": 4.78},
+        {"Language": "Amharic", "Audios Evaluated": 132, "Accuracy": 4.65, "Meaning": 4.82},
+        {"Language": "Lingala", "Audios Evaluated": 30, "Accuracy": 4.63, "Meaning": 4.70},
+        {"Language": "Hausa", "Audios Evaluated": 60, "Accuracy": 4.58, "Meaning": 4.97},
+        {"Language": "Oromo", "Audios Evaluated": 115, "Accuracy": 4.54, "Meaning": 4.52},
+        {"Language": "Bemba", "Audios Evaluated": 116, "Accuracy": 4.39, "Meaning": 4.86},
+        {"Language": "Yoruba", "Audios Evaluated": 122, "Accuracy": 4.22, "Meaning": 4.48},
+        {"Language": "Wolof", "Audios Evaluated": 53, "Accuracy": 3.98, "Meaning": 4.13},
+        {"Language": "Kinyarwanda", "Audios Evaluated": 103, "Accuracy": 3.75, "Meaning": 4.81},
+        {"Language": "Xhosa", "Audios Evaluated": 8, "Accuracy": 3.62, "Meaning": 3.38},
+        {"Language": "Afrikaans", "Audios Evaluated": 116, "Accuracy": 3.59, "Meaning": 4.10},
+        {"Language": "Igbo", "Audios Evaluated": 55, "Accuracy": 2.25, "Meaning": 2.15},
+    ]
+    df_results = pd.DataFrame(results_data)
+    st.dataframe(df_results, width="stretch")
+    # Narrative summary
+    st.markdown("""
+    ### Key Takeaways
     - **High-Performing Languages:**
+      - Swahili (Accuracy 4.96, Meaning 4.97)
+      - Luganda (Accuracy 4.70, Meaning 4.78)
+      - Amharic (Accuracy 4.65, Meaning 4.82)
+      These models produced highly accurate transcriptions with minimal meaning loss.
     - **Moderate Performance:**
+      Hausa, Oromo, Bemba, Yoruba, Wolof, and Kinyarwanda — generally understandable, but often with orthography and punctuation issues.
+    - **Low-Performing Languages from evaluation:**
+      - Igbo (Accuracy 2.25, Meaning 2.15)
+      - Afrikaans (Accuracy 3.59, Meaning 4.10)
+      - Xhosa (Accuracy 3.62, Meaning 3.38)
     """)
     # --- Error Patterns ---
     st.subheader("Common Error Patterns")
     st.write("""
+    Evaluators highlighted several recurring challenges and areas for improvement across
+    different languages. These reflect both linguistic complexities and system limitations.
+    """)
+    error_data = [
+        {"Issue": "Punctuation and Formatting",
+         "Comments": "Absence of punctuation, lack of capitalisation"},
+        {"Issue": "Spelling and Grammar",
+         "Comments": "Word merging, frequent spelling mistakes in individual words"},
+        {"Issue": "Named Entity Recognition",
+         "Comments": "Inaccurate handling of numbers, currencies, and names"},
+        {"Issue": "Device Compatibility & Performance",
+         "Comments": "Better performance on laptops than on mobile phones"},
+    ]
+    df_errors = pd.DataFrame(error_data)
+    st.dataframe(df_errors, width="stretch")
+    st.markdown("""
+    ### Summary
+    1. **Punctuation and formatting inconsistencies** make transcriptions harder to read.
+    2. **Word merging and spelling errors** were frequent, particularly in morphologically rich languages.
+    3. **Named entity recognition** (e.g., names, currencies, numbers) was a common source of error.
+    4. **Platform performance** was reported as better on laptops than mobile devices.
     """)
     # --- Takeaways ---
     st.subheader("Takeaways")
     st.write("""
     - Human ratings generally aligned with automatic metrics: languages with larger datasets (Swahili, Luganda, Amharic) scored highest.
     - WER alone misses issues such as meaning drift, orthography violations, and named entity errors.
     """)