flozi00 commited on
Commit
992e5e1
·
1 Parent(s): aad68e1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: de
3
+ datasets:
4
+ - common_voice
5
+ metrics:
6
+ - wer
7
+ - cer
8
+ tags:
9
+ - audio
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - xlsr-fine-tuning-week
13
+ license: apache-2.0
14
+ model-index:
15
+ - name: wav2vec2-xls-r-1b-5gram-german with LM by Florian Zimmermeister
16
+ results:
17
+ - task:
18
+ name: Speech Recognition
19
+ type: automatic-speech-recognition
20
+ dataset:
21
+ name: Common Voice de
22
+ type: common_voice
23
+ args: de
24
+ metrics:
25
+ - name: Test WER
26
+ type: wer
27
+ value: 4.382541642219636
28
+ - name: Test CER
29
+ type: cer
30
+ value: 1.6235493024026488
31
+ ---
32
+ **Test Result**
33
+
34
+ | Model | WER | CER |
35
+ | ------------- | ------------- | ------------- |
36
+ | flozi00/wav2vec2-large-xlsr-53-german-with-lm | **4.382541642219636%** | **1.6235493024026488%** |
37
+
38
+ ## Evaluation
39
+ The model can be evaluated as follows on the German test data of Common Voice.
40
+
41
+ ```python
42
+ import torch
43
+ from transformers import AutoModelForCTC, AutoProcessor
44
+ from unidecode import unidecode
45
+ import re
46
+ from datasets import load_dataset, load_metric
47
+ import datasets
48
+
49
+ counter = 0
50
+ wer_counter = 0
51
+ cer_counter = 0
52
+ device = "cuda" if torch.cuda.is_available() else "cpu"
53
+
54
+
55
+ special_chars = [["Ä"," AE "], ["Ö"," OE "], ["Ü"," UE "], ["ä"," ae "], ["ö"," oe "], ["ü"," ue "]]
56
+ def clean_text(sentence):
57
+ for special in special_chars:
58
+ sentence = sentence.replace(special[0], special[1])
59
+
60
+ sentence = unidecode(sentence)
61
+
62
+ for special in special_chars:
63
+ sentence = sentence.replace(special[1], special[0])
64
+
65
+ sentence = re.sub("[^a-zA-Z0-9öäüÖÄÜ ,.!?]", " ", sentence)
66
+
67
+ return sentence
68
+
69
+ def main(model_id):
70
+ print("load model")
71
+ model = AutoModelForCTC.from_pretrained(model_id).to(device)
72
+ print("load processor")
73
+ processor = AutoProcessor.from_pretrained(processor_id)
74
+
75
+ print("load metrics")
76
+ wer = load_metric("wer")
77
+ cer = load_metric("cer")
78
+
79
+ ds = load_dataset("mozilla-foundation/common_voice_8_0","de")
80
+ ds = ds["test"]
81
+
82
+ ds = ds.cast_column(
83
+ "audio", datasets.features.Audio(sampling_rate=16_000)
84
+ )
85
+
86
+ def calculate_metrics(batch):
87
+ global counter, wer_counter, cer_counter
88
+ resampled_audio = batch["audio"]["array"]
89
+
90
+ input_values = processor(resampled_audio, return_tensors="pt", sampling_rate=16_000).input_values
91
+
92
+ with torch.no_grad():
93
+ logits = model(input_values.to(device)).logits.cpu().numpy()[0]
94
+
95
+
96
+ decoded = processor.decode(logits)
97
+ pred = decoded.text.lower()
98
+
99
+ ref = clean_text(batch["sentence"]).lower()
100
+
101
+ wer_result = wer.compute(predictions=[pred], references=[ref])
102
+ cer_result = cer.compute(predictions=[pred], references=[ref])
103
+
104
+ counter += 1
105
+ wer_counter += wer_result
106
+ cer_counter += cer_result
107
+
108
+ if counter % 100 == True:
109
+ print(f"WER: {(wer_counter/counter)*100} | CER: {(cer_counter/counter)*100}")
110
+
111
+ return batch
112
+
113
+
114
+ ds.map(calculate_metrics, remove_columns=ds.column_names)
115
+ print(f"WER: {(wer_counter/counter)*100} | CER: {(cer_counter/counter)*100}")
116
+
117
+ model_id = "flozi00/wav2vec2-xls-r-1b-5gram-german"
118
+ main(model_id)
119
+ ```