--- language: da datasets: - common-voice-9 - nst tags: - speech-to-text license: apache-2.0 --- # xls-r-300m-danish-nst-cv9 This is a version of [chcaa/xls-r-300m-danish](https://huggingface.co/chcaa/xls-r-300m-danish) finetuned for Danish ASR on the training set of the public NST dataset and the Danish part of Common Voice 9. The model is trained on 16kHz, so ensure that you use the same sample rate. The model was trained using fairseq with [this config](https://github.com/centre-for-humanities-computing/Gjallarhorn/blob/main/fairseq_configs/finetuning/xlrs_finetune.yaml) for 120.000 steps. ## Usage ```Python import torch from datasets import load_dataset from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor # load model and tokenizer processor = Wav2Vec2Processor.from_pretrained( "chcaa/xls-r-300m-danish-nst-cv9") model = Wav2Vec2ForCTC.from_pretrained( "chcaa/xls-r-300m-danish-nst-cv9") # load dataset and read soundfiles ds = load_dataset("Alvenir/alvenir_asr_da_eval", split="test") # tokenize input_values = processor( ds[0]["audio"]["array"], return_tensors="pt", padding="longest" ).input_values # Batch size 1 # retrieve logits logits = model(input_values).logits # take argmax and decode predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids) print(transcription) ``` ## Performance The table below shows the WER rate (greedy, no language model) of four different Danish ASR models on three publicly available datasets (lower is better). |Model | [Alvenir](https://huggingface.co/datasets/Alvenir/alvenir_asr_da_eval)| [NST](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-19/)| [CV9.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)| |:--------------------------------------|------:|-----:|-----:| |[Alvenir/wav2vec2-base-da-ft-nst](https://huggingface.co/Alvenir/wav2vec2-base-da-ft-nst) | 0.202| 0.099| 0.238| |[chcaa/alvenir-wav2vec2-base-da-nst-cv9](https://huggingface.co/chcaa/alvenir-wav2vec2-base-da-nst-cv9) | 0.233| 0.126| 0.256| |[chcaa/xls-r-300m-nst-cv9-da](https://huggingface.co/chcaa/xls-r-300m-nst-cv9-da) | 0.105| 0.060| 0.119| |chcaa/xls-r-300m-danish-nst-cv9 | 0.082| 0.051| 0.108| The model was finetuned in collaboration with [Alvenir](https://alvenir.ai).