---
language:
- sv-SE
license: cc0-1.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_8_0
- generated_from_trainer
- sv
- robust-speech-event
- model_for_talk
datasets:
- mozilla-foundation/common_voice_8_0
- marinone94/nst_sv
model-index:
- name: XLS-R-300M - Swedish
  results:
  - task: 
      name: Automatic Speech Recognition 
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_8_0
      type: mozilla-foundation/common_voice_8_0
      args: sv-SE
    metrics:
       - name: Test WER
         type: wer
         value: 16.98
       - name: Test CER
         type: cer
         value: 5.66
  - task: 
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: speech-recognition-community-v2/dev_data
      type: speech-recognition-community-v2/dev_data
      args: sv
    metrics:
       - name: Test WER
         type: wer
         value: 27.01
       - name: Test CER
         type: cer
         value: 13.14
---

This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on 2 epochs of the MARINONE94/NST_SV - SV dataset (80% random split with seed 42 as the dataset for now has only the "train" split), and then on 50 epochs of the the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SV-SE dataset ("train+validation" split).
See run.sh to have a complete overview of all the training steps.
NOTE: the first training for now didn't work as expected, so it might be useless or even degrade performance. Further investigation and development is needed.