--- language: - sv-SE license: cc0-1.0 tags: - automatic-speech-recognition - mozilla-foundation/common_voice_8_0 - generated_from_trainer - sv - robust-speech-event - model_for_talk datasets: - mozilla-foundation/common_voice_8_0 - marinone94/nst_sv model-index: - name: XLS-R-300M - Swedish results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: mozilla-foundation/common_voice_8_0 type: mozilla-foundation/common_voice_8_0 args: sv-SE metrics: - name: Test WER type: wer value: 16.98 - name: Test CER type: cer value: 5.66 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: speech-recognition-community-v2/dev_data type: speech-recognition-community-v2/dev_data args: sv metrics: - name: Test WER type: wer value: 27.01 - name: Test CER type: cer value: 13.14 --- This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on 2 epochs of the MARINONE94/NST_SV - SV dataset (80% random split with seed 42 as the dataset for now has only the "train" split), and then on 50 epochs of the the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SV-SE dataset ("train+validation" split). See run.sh to have a complete overview of all the training steps. NOTE: the first training for now didn't work as expected, so it might be useless or even degrade performance. Further investigation and development is needed.