language: | |
- sv-SE | |
license: cc0-1.0 | |
tags: | |
- automatic-speech-recognition | |
- mozilla-foundation/common_voice_8_0 | |
- generated_from_trainer | |
- sv | |
- robust-speech-event | |
- model_for_talk | |
datasets: | |
- mozilla-foundation/common_voice_8_0 | |
- marinone94/nst_sv | |
model-index: | |
- name: XLS-R-300M - Swedish | |
results: | |
- task: | |
name: Automatic Speech Recognition | |
type: automatic-speech-recognition | |
dataset: | |
name: mozilla-foundation/common_voice_8_0 | |
type: mozilla-foundation/common_voice_8_0 | |
args: sv-SE | |
metrics: | |
- name: Test WER | |
type: wer | |
value: 16.98 | |
- name: Test CER | |
type: cer | |
value: 5.66 | |
- task: | |
name: Automatic Speech Recognition | |
type: automatic-speech-recognition | |
dataset: | |
name: speech-recognition-community-v2/dev_data | |
type: speech-recognition-community-v2/dev_data | |
args: sv | |
metrics: | |
- name: Test WER | |
type: wer | |
value: 27.01 | |
- name: Test CER | |
type: cer | |
value: 13.14 | |
This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on 2 epochs of the MARINONE94/NST_SV - SV dataset (80% random split with seed 42 as the dataset for now has only the "train" split), and then on 50 epochs of the the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SV-SE dataset ("train+validation" split). | |
See run.sh to have a complete overview of all the training steps. | |
NOTE: the first training for now didn't work as expected, so it might be useless or even degrade performance. Further investigation and development is needed. | |
d73da225cfdc57213ea4ab67b24bb87ac41f4392 is the commit at the end of the first training: | |
``` | |
sh run.sh | |
``` |