James Kassemi commited on
Commit
3511def
β€’
1 Parent(s): 7de393f

Fix TypeError during data collection

Browse files

Language information from the dataset `args` can contain a `"language"`
key referencing a string and not the expected dict. On parsing this
data, the application errors with "TypeError: string indices must be
integers" and then fails to load.

This fix checks the type of `args` and ensures that it's a dict. If not,
it uses the previously developed deafult bahavior: using the
`"language"` value from the model's metadata.

I'm happy to reach out to the one model owner with the non-standard
configuration, though it does look like it may have been generated by
πŸ€— Trainer: https://huggingface.co/sanchit-gandhi/whisper-small-hi/edit/main/README.md.

Here's the record that causes the error in production:

```
meta: {'language': ['hi'], 'license': 'apache-2.0', 'tags': ['hf-asr-leaderboard', 'generated_from_trainer'], 'datasets': ['mozilla-foundation/common_voice_11_0'], 'metrics': ['wer'], 'model-index': [{'name': 'Whisper Small Hi - Sanchit Gandhi', 'results': [{'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'dataset': {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 32.09599593667993}]}]}]}
result["dataset"]: {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}
```

Fixes huggingface/hf-speech-bench#10 and possibly
Fixes huggingface/hf-speech-bench#9.

According to huggingface/hf-speech-bench#8 as of two months ago, users
are reporting that the leaderboard has moved, but this repository is
still seeing staff contributions. Submitting fix for review regardless.

Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -68,7 +68,7 @@ def parse_metrics_rows(meta):
68
  if "dataset" not in result or "metrics" not in result:
69
  continue
70
  dataset = result["dataset"]["type"]
71
- if "args" in result["dataset"] and "language" in result["dataset"]["args"]:
72
  lang = result["dataset"]["args"]["language"]
73
  else:
74
  lang = meta["language"]
 
68
  if "dataset" not in result or "metrics" not in result:
69
  continue
70
  dataset = result["dataset"]["type"]
71
+ if "args" in result["dataset"] and isinstance(result["dataset"]["args"], dict) and "language" in result["dataset"]["args"]:
72
  lang = result["dataset"]["args"]["language"]
73
  else:
74
  lang = meta["language"]