Shared Task: Mozilla Common Voice Spontaneous Speech ASR
https://www.codabench.org/competitions/10820/
Inference code and weights of the 1st place solution in 3 of 4 subtasks:
- Multilingual General
- Best small model
- Unseen Languages
Please see paper.pdf and solution.ipynb for the details and code entry point.
Training code is available in the following repository:
https://huggingface.co/vecxoz/mozilla-shared-task-1st-place-mms-training
Author: Igor Ivanov (team "vecxoz")
This repository contains both fine-tuned model weights and
the inference code used to obtain the winning scores.
Code and weights are licensed separately.
License for code: MIT
License for weights: CC-BY-NC-4.0
License for SCTK distribution can be found in the corresponding subdirectory.
The model weights are a derivative work of the following models,
obtained by fine-tuning on the Common Voice datasets.
https://huggingface.co/facebook/mms-1b-fl102
https://huggingface.co/facebook/mms-1b-l1107
https://huggingface.co/facebook/mms-1b-all
The test dataset is not included according to the Common Voice requirements. It is available via the link:
https://datacollective.mozillafoundation.org/datasets/cminc35no007no707hql26lzk
Directory structure of the dataset is the following:
mozilla-shared-task-1st-place-mms-inference
|
|-- mdc_asr_shared_task_test_data
|
|-- audios
| |-- spontaneous-speech-ady-67085.mp3
| |-- ...
| |-- spontaneous-speech-ush-39974.mp3
|
|-- multilingual-general
| |-- aln.tsv
| |-- ...
| |-- ukv.tsv
|
|-- small-model
| |-- ady.tsv
| |-- ...
| |-- ush.tsv
|
|-- unseen-langs
|-- ady.tsv
|-- ...
|-- ush.tsv