chcaa
/

alvenir-wav2vec2-base-da-nst-cv9

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

HLasse commited on Jul 12, 2022

Commit

31756dd

•

1 Parent(s): 77b603d

Create README.md

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+language: da
+datasets:
+  - common-voice-9
+  - nst
+tags:
+- speech-to-text
+license: apache-2.0
+---
+# xls-r-300m-danish-nst-cv9
+This is a version of [alvenir/wav2vec2-base-da](https://huggingface.co/Alvenir/wav2vec2-base-da) for finetuned for Danish ASR on the training set of the public NST dataset and the Danish part of Common Voice 9. The model is trained on 16kHz, so ensure that you use the same sample rate.
+The model was trained using fairseq for 120.000 steps.
+## Usage
+```Python
+import torch
+from datasets import load_dataset
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+# load model and tokenizer
+processor = Wav2Vec2Processor.from_pretrained(
+    "chcaa/alvenir-wav2vec2-base-da-nst-cv9")
+model = Wav2Vec2ForCTC.from_pretrained(
+    "chcaa/alvenir-wav2vec2-base-da-nst-cv9")
+# load dataset and read soundfiles
+ds = load_dataset("Alvenir/alvenir_asr_da_eval", split="test")
+# tokenize
+input_values = processor(
+    ds[0]["audio"]["array"], return_tensors="pt", padding="longest"
+).input_values  # Batch size 1
+# retrieve logits
+logits = model(input_values).logits
+# take argmax and decode
+predicted_ids = torch.argmax(logits, dim=-1)
+transcription = processor.batch_decode(predicted_ids)
+print(transcription)
+```
+## Performance
+The table below shows the WER rate of four different Danish ASR models on three publicly available datasets.
+|Model                                  | [Alvenir](https://huggingface.co/datasets/Alvenir/alvenir_asr_da_eval)|   [NST](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-19/)|     [CV9.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)|
+|:--------------------------------------|------:|-----:|-----:|
+|[Alvenir/wav2vec2-base-da-ft-nst](https://huggingface.co/Alvenir/wav2vec2-base-da-ft-nst)        |  0.202| 0.099| 0.238|
+|chcaa/alvenir-wav2vec2-base-da-nst-cv9 |  0.233| 0.126| 0.256|
+|[chcaa/xls-r-300m-nst-cv9-da](https://huggingface.co/chcaa/xls-r-300m-nst-cv9-da)   |  0.105| 0.060| 0.119|
+|[chcaa/xls-r-300m-danish-nst-cv9](https://huggingface.co/chcaa/xls-r-300m-danish-nst-cv9)        |  0.082| 0.051| 0.108|
+The model was finetuned in collaboration with [Alvenir](https://alvenir.ai).