Shared Task: Mozilla Common Voice Spontaneous Speech ASR

https://www.codabench.org/competitions/10820/

Inference code and weights of the 1st place solution in 3 of 4 subtasks:

Multilingual General
Best small model
Unseen Languages

Please see paper.pdf and solution.ipynb for the details and code entry point.

Training code is available in the following repository:
https://huggingface.co/vecxoz/mozilla-shared-task-1st-place-mms-training

Author: Igor Ivanov (team "vecxoz")

This repository contains both fine-tuned model weights and
the inference code used to obtain the winning scores.
Code and weights are licensed separately.
License for code: MIT
License for weights: CC-BY-NC-4.0
License for SCTK distribution can be found in the corresponding subdirectory.

The model weights are a derivative work of the following models,
obtained by fine-tuning on the Common Voice datasets.
https://huggingface.co/facebook/mms-1b-fl102
https://huggingface.co/facebook/mms-1b-l1107
https://huggingface.co/facebook/mms-1b-all

The test dataset is not included according to the Common Voice requirements. It is available via the link:
https://datacollective.mozillafoundation.org/datasets/cminc35no007no707hql26lzk

Directory structure of the dataset is the following:

mozilla-shared-task-1st-place-mms-inference
|
|-- mdc_asr_shared_task_test_data
    |
    |-- audios
    |   |-- spontaneous-speech-ady-67085.mp3
    |   |-- ...
    |   |-- spontaneous-speech-ush-39974.mp3
    |
    |-- multilingual-general
    |   |-- aln.tsv
    |   |-- ...
    |   |-- ukv.tsv
    |
    |-- small-model
    |   |-- ady.tsv
    |   |-- ...
    |   |-- ush.tsv
    |
    |-- unseen-langs
        |-- ady.tsv
        |-- ...
        |-- ush.tsv

Downloads last month: -; Downloads are not tracked for this model. How to track