HLasse commited on
Commit
31756dd
1 Parent(s): 77b603d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: da
3
+ datasets:
4
+ - common-voice-9
5
+ - nst
6
+ tags:
7
+ - speech-to-text
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # xls-r-300m-danish-nst-cv9
12
+
13
+ This is a version of [alvenir/wav2vec2-base-da](https://huggingface.co/Alvenir/wav2vec2-base-da) for finetuned for Danish ASR on the training set of the public NST dataset and the Danish part of Common Voice 9. The model is trained on 16kHz, so ensure that you use the same sample rate.
14
+
15
+ The model was trained using fairseq for 120.000 steps.
16
+
17
+
18
+ ## Usage
19
+ ```Python
20
+ import torch
21
+ from datasets import load_dataset
22
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
23
+
24
+ # load model and tokenizer
25
+ processor = Wav2Vec2Processor.from_pretrained(
26
+ "chcaa/alvenir-wav2vec2-base-da-nst-cv9")
27
+ model = Wav2Vec2ForCTC.from_pretrained(
28
+ "chcaa/alvenir-wav2vec2-base-da-nst-cv9")
29
+
30
+ # load dataset and read soundfiles
31
+ ds = load_dataset("Alvenir/alvenir_asr_da_eval", split="test")
32
+
33
+ # tokenize
34
+ input_values = processor(
35
+ ds[0]["audio"]["array"], return_tensors="pt", padding="longest"
36
+ ).input_values # Batch size 1
37
+
38
+ # retrieve logits
39
+ logits = model(input_values).logits
40
+
41
+ # take argmax and decode
42
+ predicted_ids = torch.argmax(logits, dim=-1)
43
+ transcription = processor.batch_decode(predicted_ids)
44
+ print(transcription)
45
+ ```
46
+
47
+ ## Performance
48
+ The table below shows the WER rate of four different Danish ASR models on three publicly available datasets.
49
+
50
+ |Model | [Alvenir](https://huggingface.co/datasets/Alvenir/alvenir_asr_da_eval)| [NST](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-19/)| [CV9.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)|
51
+ |:--------------------------------------|------:|-----:|-----:|
52
+ |[Alvenir/wav2vec2-base-da-ft-nst](https://huggingface.co/Alvenir/wav2vec2-base-da-ft-nst) | 0.202| 0.099| 0.238|
53
+ |chcaa/alvenir-wav2vec2-base-da-nst-cv9 | 0.233| 0.126| 0.256|
54
+ |[chcaa/xls-r-300m-nst-cv9-da](https://huggingface.co/chcaa/xls-r-300m-nst-cv9-da) | 0.105| 0.060| 0.119|
55
+ |[chcaa/xls-r-300m-danish-nst-cv9](https://huggingface.co/chcaa/xls-r-300m-danish-nst-cv9) | 0.082| 0.051| 0.108|
56
+
57
+ The model was finetuned in collaboration with [Alvenir](https://alvenir.ai).