Edit model card

XLS-R-based CTC model with 5-gram language model from Open Subtitles

This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the CGN dataset, as well as the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a large 5-gram language model is added based on the Open Subtitles Dutch corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):

  • Wer: 0.04057
  • Cer: 0.01222

Model description

The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.

To improve accuracy, a beam-search decoder based on pyctcdecode is then used; it reranks the most promising alignments based on a 5-gram language model trained on the Open Subtitles Dutch corpus.

Intended uses & limitations

This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).

Training and evaluation data

The model was:

  1. initialized with the 2B parameter model from Facebook.
  2. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.
  3. trained 1 epoch (36000 iterations of batch size 32) on the cgn dataset.
  4. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.

Framework versions

  • Transformers 4.16.0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.3
  • Tokenizers 0.11.0
Downloads last month
12

Dataset used to train FremyCompany/xls-r-2b-nl-v2_lm-5gram-os

Evaluation results