Wav2vec2-base pretraining for Danish
This wav2vec2-base model has been pretrained on ~1300 hours of danish speech data. The pretraining data consists of podcasts and audiobooks and is unfortunately not public available. However, we are allowed to distribute the pretrained model.
The pre-training was done using the fairseq library in January 2021.
It needs to be fine-tuned in order to perform speech recognition.