KBLab
/

wav2vec2-large-voxrex-swedish

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

marma commited on Jan 10, 2022

Commit

ce279e0

•

1 Parent(s): 81f9b47

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -26,11 +26,11 @@ model-index:
       type: wer
       value: 9.914
 ---
-# Wav2vec 2.0 large VoxRex Swedish (B)
 **Disclaimer:** This is a work in progress. See [VoxRex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) for more details.
-Finetuned version of KBs [VoxRex large](https://huggingface.co/KBLab/wav2vec2-large-voxrex) model using Swedish radio broadcasts, NST and Common Voice data. Evalutation without a language model gives the following: WER for NST + Common Voice test set (2% of total sentences) is **3.617%**. WER for Common Voice test set is **9.914%** directly and **7.77%** with a 4-gram language model.
 When using this model, make sure that your speech input is sampled at 16kHz.
@@ -40,7 +40,7 @@ When using this model, make sure that your speech input is sampled at 16kHz.
 <center>*<i>Chart shows performance without the additional 20k steps of Common Voice fine-tuning</i></center>
 ## Training
-This model has been fine-tuned for 120000 updates on NST + CommonVoice and then for an additional 20000 updates on CommonVoice only. The additional fine-tuning on CommonVoice hurts performance on the NST+CommonVoice test set somewhat and, unsurprisingly, improves it on the CommonVoice test set. It seems to perform generally better though [citation needed].
 ![WER during training](chart_1.svg "WER")

       type: wer
       value: 9.914
 ---
+# Wav2vec 2.0 large VoxRex Swedish (C)
 **Disclaimer:** This is a work in progress. See [VoxRex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) for more details.
+Finetuned version of KBs [VoxRex large](https://huggingface.co/KBLab/wav2vec2-large-voxrex) model using Swedish radio broadcasts, NST and Common Voice data. Evalutation without a language model gives the following: WER for NST + Common Voice test set (2% of total sentences) is **2.5%**. WER for Common Voice test set is **8.49%** directly and **7.37%** with a 4-gram language model.
 When using this model, make sure that your speech input is sampled at 16kHz.
 <center>*<i>Chart shows performance without the additional 20k steps of Common Voice fine-tuning</i></center>
 ## Training
+This model has been fine-tuned for 120000 updates on NST + CommonVoice<del> and then for an additional 20000 updates on CommonVoice only. The additional fine-tuning on CommonVoice hurts performance on the NST+CommonVoice test set somewhat and, unsurprisingly, improves it on the CommonVoice test set. It seems to perform generally better though [citation needed]</del>.
 ![WER during training](chart_1.svg "WER")