huggingface/hf-speech-bench · Fix TypeError during data collection

Language information from the dataset args can contain a "language" key referencing a string and not the expected dict. On parsing this data, the application errors with "TypeError: string indices must be integers" and then fails to load. This fix checks the type of args and ensures that it's a dict. If not, it uses the previously developed deafult bahavior: using the "language" value from the model's metadata.

I'm happy to reach out to the one model owner with the non-standard configuration, though it does look like it may have been generated by 🤗 Trainer: https://huggingface.co/sanchit-gandhi/whisper-small-hi/edit/main/README.md.

Here's the record that causes the error in production:

meta: {'language': ['hi'], 'license': 'apache-2.0', 'tags': ['hf-asr-leaderboard', 'generated_from_trainer'], 'datasets': ['mozilla-foundation/common_voice_11_0'], 'metrics': ['wer'], 'model-index': [{'name': 'Whisper Small Hi - Sanchit Gandhi', 'results': [{'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'dataset': {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 32.09599593667993}]}]}]}
result["dataset"]: {'name': 'Common Voice 11.0', 'type': 'mozilla-foundation/common_voice_11_0', 'config': 'hi', 'split': 'test', 'args': 'language hi'}

Fixes #10 and possibly Fixes #9. According to huggingface/hf-speech-bench#8 as of two months ago, users are reporting that the leaderboard has moved, but this repository is still seeing staff contributions. Submitting fix for review regardless.