marma commited on
Commit
5276b3b
2 Parent(s): 5c946f0 4ef50cd

Merge branch 'main' of https://huggingface.co/KBLab/wav2vec2-large-voxpopuli-sv-swedish into main

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -25,18 +25,18 @@ model-index:
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
- value: 13.585485
29
  - name: Test CER
30
  type: cer
31
- value: 4.850368
32
  ---
33
  # Wav2vec 2.0 large-voxpopuli-sv-swedish
34
- Finetuned version of Facebooks [VoxPopuli-sv large](https://huggingface.co/facebook/wav2vec2-large-sv-voxpopuli) model using NST and Common Voice data. Evalutation without a language model gives the following: WER for NST + Common Voice test set (2% of total sentences) is **6.30%**, WER for Common Voice test set is **13.59%** directly and **9.5%** with a 4-gram language model.
35
 
36
  When using this model, make sure that your speech input is sampled at 16kHz.
37
 
38
  ## Training
39
- This model has been fine-tuned for 80000 updates on NST + CommonVoice and then for an additional 20000 steps on only CommonVoice. The additional fine-tuning on CommonVoce hurts performance on the NST+CommonVoice test set somewhat and, unsurprisingly, improves it on the CommonVoice test set. It seems to perform generally better though [citation needed].
40
 
41
  ## Usage
42
  The model can be used directly (without a language model) as follows:
 
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
+ value: 13.386893
29
  - name: Test CER
30
  type: cer
31
+ value: 4.795275
32
  ---
33
  # Wav2vec 2.0 large-voxpopuli-sv-swedish
34
+ Finetuned version of Facebooks [VoxPopuli-sv large](https://huggingface.co/facebook/wav2vec2-large-sv-voxpopuli) model using NST and Common Voice data. Evalutation without a language model gives the following: WER for NST + Common Voice test set (2% of total sentences) is **6.58%**, WER for Common Voice test set is **13.39%** directly and **9.5%** with a 4-gram language model.
35
 
36
  When using this model, make sure that your speech input is sampled at 16kHz.
37
 
38
  ## Training
39
+ This model has been fine-tuned for 80000 updates on NST + CommonVoice and then for an additional 40000 steps on only CommonVoice. The additional fine-tuning on CommonVoce hurts performance on the NST+CommonVoice test set somewhat and, unsurprisingly, improves it on the CommonVoice test set. It seems to perform generally better though [citation needed].
40
 
41
  ## Usage
42
  The model can be used directly (without a language model) as follows: