Spaces:

huggingface
/

hf-speech-bench

Running

James Kassemi commited on Sep 19, 2023

Commit

3511def

1 Parent(s): 7de393f

Fix TypeError during data collection

Language information from the dataset `args` can contain a `"language"`
key referencing a string and not the expected dict. On parsing this
data, the application errors with "TypeError: string indices must be
integers" and then fails to load.

This fix checks the type of `args` and ensures that it's a dict. If not,
it uses the previously developed deafult bahavior: using the
`"language"` value from the model's metadata.

I'm happy to reach out to the one model owner with the non-standard
configuration, though it does look like it may have been generated by
🤗 Trainer: https://huggingface.co/sanchit-gandhi/whisper-small-hi/edit/main/README.md.

Here's the record that causes the error in production:

```
meta: {'language': ['hi'], 'license': 'apache-2.0', 'tags': ['hf-asr-leaderboard', 'generated_from_trainer'], 'datasets': ['mozilla-foundation/common_voice_11_0'], 'metrics': ['wer'], 'model-index': [{'name': 'Whisper Small Hi - Sanchit Gandhi', 'results': [{'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'dataset': {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 32.09599593667993}]}]}]}
result["dataset"]: {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}
```

Fixes huggingface/hf-speech-bench#10 and possibly
Fixes huggingface/hf-speech-bench#9.

According to huggingface/hf-speech-bench#8 as of two months ago, users
are reporting that the leaderboard has moved, but this repository is
still seeing staff contributions. Submitting fix for review regardless.

Files changed (1) hide show

app.py +1 -1

app.py CHANGED Viewed

@@ -68,7 +68,7 @@ def parse_metrics_rows(meta):
         if "dataset" not in result or "metrics" not in result:
             continue
         dataset = result["dataset"]["type"]
-        if "args" in result["dataset"] and "language" in result["dataset"]["args"]:
             lang = result["dataset"]["args"]["language"]
         else:
             lang = meta["language"]

         if "dataset" not in result or "metrics" not in result:
             continue
         dataset = result["dataset"]["type"]
+        if "args" in result["dataset"] and isinstance(result["dataset"]["args"], dict) and "language" in result["dataset"]["args"]:
             lang = result["dataset"]["args"]["language"]
         else:
             lang = meta["language"]