# XLS-R-based CTC model with 5-gram language model from Open Subtitles

This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the CGN dataset, as well as the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a large 5-gram language model is added based on the Open Subtitles Dutch corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):

• Wer: 0.04057
• Cer: 0.01222

## Model description

The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.

To improve accuracy, a beam-search decoder based on pyctcdecode is then used; it reranks the most promising alignments based on a 5-gram language model trained on the Open Subtitles Dutch corpus.

## Intended uses & limitations

This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).

## Training and evaluation data

The model was:

1. initialized with the 2B parameter model from Facebook.
2. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.
3. trained 1 epoch (36000 iterations of batch size 32) on the cgn dataset.
4. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.

### Framework versions

• Transformers 4.16.0
• Pytorch 1.10.2+cu102
• Datasets 1.18.3
• Tokenizers 0.11.0