FremyCompany
/

xls-r-nl-v1-cv8-lm

Automatic Speech Recognition

hf-asr-leaderboard

mozilla-foundation/common_voice_8_0

robust-speech-event

Inference Endpoints

Model card Files Files and versions Community

FremyCompany commited on Feb 1, 2022

Commit

f6ca04e

•

1 Parent(s): 7d3ab33

Improve description of the system

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -44,16 +44,17 @@ model-index:
          value: 11.26
 ---
-# output
-This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://huggingface.co/facebook/wav2vec2-xls-r-2b-22-to-16) fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below).
-It achieves the following results on the evaluation set (of Common Voice 8.0):
 - Wer: 0.0669
 - Cer: 0.0197
 ## Model description
-The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
 ## Intended uses & limitations

          value: 11.26
 ---
+# XLS-R-based CTC model with 5-gram language model from Common Voice
+This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://huggingface.co/facebook/wav2vec2-xls-r-2b-22-to-16) fine-tuned mainly on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a small 5-gram language model is added based on the Common Voice training corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):
 - Wer: 0.0669
 - Cer: 0.0197
 ## Model description
+The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the final result.
+To improve accuracy, a beam decoder is used; the beams are scored based on 5-gram language model trained on the Common Voice 8 corpus.
 ## Intended uses & limitations