carlosdanielhernandezmena commited on
Commit
3498789
1 Parent(s): b7c693c

Adding info to the Readme file

Browse files
Files changed (1) hide show
  1. README.md +208 -0
README.md CHANGED
@@ -1,3 +1,211 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: is
3
+ datasets:
4
+ - samromur
5
+ - samromur_children
6
+ - malromur
7
+ - althingi
8
+ tags:
9
+ - audio
10
+ - automatic-speech-recognition
11
+ - icelandic
12
+ - xlrs-53-icelandic
13
+ - iceland
14
+ - reykjavik
15
+ - samromur
16
  license: cc-by-4.0
17
+ widget:
18
+ model-index:
19
+ - name: wav2vec2-large-xlsr-53-icelandic-ep10-1000h
20
+ results:
21
+ - task:
22
+ name: Automatic Speech Recognition
23
+ type: automatic-speech-recognition
24
+ dataset:
25
+ name: Samrómur (Test)
26
+ type: language-and-voice-lab/samromur_asr
27
+ split: test
28
+ args:
29
+ language: is
30
+ metrics:
31
+ - name: WER
32
+ type: wer
33
+ value: ???
34
+ - task:
35
+ name: Automatic Speech Recognition
36
+ type: automatic-speech-recognition
37
+ dataset:
38
+ name: Samrómur (Dev)
39
+ type: language-and-voice-lab/samromur_asr
40
+ split: validation
41
+ args:
42
+ language: is
43
+ metrics:
44
+ - name: WER
45
+ type: wer
46
+ value: ???
47
+ - task:
48
+ name: Automatic Speech Recognition
49
+ type: automatic-speech-recognition
50
+ dataset:
51
+ name: Samrómur Children (Test)
52
+ type: language-and-voice-lab/samromur_children
53
+ split: test
54
+ args:
55
+ language: is
56
+ metrics:
57
+ - name: WER
58
+ type: wer
59
+ value: ???
60
+ - task:
61
+ name: Automatic Speech Recognition
62
+ type: automatic-speech-recognition
63
+ dataset:
64
+ name: Samrómur Children (Dev)
65
+ type: language-and-voice-lab/samromur_children
66
+ split: validation
67
+ args:
68
+ language: is
69
+ metrics:
70
+ - name: WER
71
+ type: wer
72
+ value: ???
73
+ - task:
74
+ name: Automatic Speech Recognition
75
+ type: automatic-speech-recognition
76
+ dataset:
77
+ name: Malrómur (Test)
78
+ type: language-and-voice-lab/malromur_asr
79
+ split: test
80
+ args:
81
+ language: is
82
+ metrics:
83
+ - name: WER
84
+ type: wer
85
+ value: ???
86
+ - task:
87
+ name: Automatic Speech Recognition
88
+ type: automatic-speech-recognition
89
+ dataset:
90
+ name: Malrómur (Dev)
91
+ type: language-and-voice-lab/malromur_asr
92
+ split: validation
93
+ args:
94
+ language: is
95
+ metrics:
96
+ - name: WER
97
+ type: wer
98
+ value: ???
99
+ - task:
100
+ name: Automatic Speech Recognition
101
+ type: automatic-speech-recognition
102
+ dataset:
103
+ name: Althingi (Test)
104
+ type: althingi_test
105
+ split: test
106
+ args:
107
+ language: is
108
+ metrics:
109
+ - name: WER
110
+ type: wer
111
+ value: ???
112
+ - task:
113
+ name: Automatic Speech Recognition
114
+ type: automatic-speech-recognition
115
+ dataset:
116
+ name: Althingi (Dev)
117
+ type: althingi_dev
118
+ split: validation
119
+ args:
120
+ language: is
121
+ metrics:
122
+ - name: WER
123
+ type: wer
124
+ value: ???
125
  ---
126
+ # wav2vec2-large-xlsr-53-icelandic-ep10-1000h
127
+
128
+ The "wav2vec2-large-xlsr-53-icelandic-ep10-1000h" is an acoustic model suitable for Automatic Speech Recognition in Icelandic. It is the result of fine-tuning the model "facebook/wav2vec2-large-xlsr-53" for 10 epochs with around 1000 hours of Icelandic data developed by the [Language and Voice Laboratory](https://huggingface.co/language-and-voice-lab). Most of the data is available at public repositories such as [LDC](https://www.ldc.upenn.edu/), [OpenSLR](https://openslr.org/) or [Clarin.is](https://clarin.is/)
129
+
130
+ The specific list of corpora used to fine-tune the model is:
131
+
132
+ - [Samrómur 21.05 (114h34m)](http://www.openslr.org/112/)
133
+ - [Samrómur Children (127h25m)](https://catalog.ldc.upenn.edu/LDC2022S11)
134
+ - [Malrómur (119hh03m)](https://clarin.is/en/resources/malromur/)
135
+ - [Althingi Parliamentary Speech (514h29m)](https://catalog.ldc.upenn.edu/LDC2021S01)
136
+ - L2-Speakers Data (125h55m) **Unpublished material**
137
+
138
+ The fine-tuning process was performed during December (2022) in the servers of the Language and Voice Laboratory (https://lvl.ru.is/) at Reykjavík University (Iceland) by Carlos Daniel Hernández Mena.
139
+
140
+ # Evaluation
141
+ ```python
142
+ import torch
143
+ from transformers import Wav2Vec2Processor
144
+ from transformers import Wav2Vec2ForCTC
145
+
146
+ #Load the processor and model.
147
+ MODEL_NAME="carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h"
148
+ processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
149
+ model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)
150
+
151
+ #Load the dataset
152
+ from datasets import load_dataset, load_metric, Audio
153
+ ds=load_dataset("language-and-voice-lab/samromur_children", split="test")
154
+
155
+ #Downsample to 16kHz
156
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
157
+
158
+ #Process the dataset
159
+ def prepare_dataset(batch):
160
+ audio = batch["audio"]
161
+ #Batched output is "un-batched" to ensure mapping is correct
162
+ batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
163
+ with processor.as_target_processor():
164
+ batch["labels"] = processor(batch["normalized_text"]).input_ids
165
+ return batch
166
+ ds = ds.map(prepare_dataset, remove_columns=ds.column_names,num_proc=1)
167
+
168
+ #Define the evaluation metric
169
+ import numpy as np
170
+ wer_metric = load_metric("wer")
171
+ def compute_metrics(pred):
172
+ pred_logits = pred.predictions
173
+ pred_ids = np.argmax(pred_logits, axis=-1)
174
+ pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
175
+ pred_str = processor.batch_decode(pred_ids)
176
+ #We do not want to group tokens when computing the metrics
177
+ label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
178
+ wer = wer_metric.compute(predictions=pred_str, references=label_str)
179
+ return {"wer": wer}
180
+
181
+ #Do the evaluation (with batch_size=1)
182
+ model = model.to(torch.device("cuda"))
183
+ def map_to_result(batch):
184
+ with torch.no_grad():
185
+ input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
186
+ logits = model(input_values).logits
187
+ pred_ids = torch.argmax(logits, dim=-1)
188
+ batch["pred_str"] = processor.batch_decode(pred_ids)[0]
189
+ batch["sentence"] = processor.decode(batch["labels"], group_tokens=False)
190
+ return batch
191
+ results = ds.map(map_to_result,remove_columns=ds.column_names)
192
+
193
+ #Compute the overall WER now.
194
+ print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["sentence"])))
195
+ ```
196
+ **Test Result**: ???
197
+
198
+ # BibTeX entry and citation info
199
+ *When publishing results based on these models please refer to:*
200
+ ```bibtex
201
+ @misc{mena2022xlrs53icelandic,
202
+ title={Acoustic Model in Icelandic: wav2vec2-large-xlsr-53-icelandic-ep10-1000h.},
203
+ author={Hernandez Mena, Carlos Daniel},
204
+ year={2022},
205
+ url={https://huggingface.co/carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h},
206
+ }
207
+ ```
208
+
209
+ # Acknowledgements
210
+
211
+ Special thanks to Jón Guðnason, head of the Language and Voice Lab for providing computational power to make this model possible. We also want to thank to the "Language Technology Programme for Icelandic 2019-2023" which is managed and coordinated by Almannarómur, and it is funded by the Icelandic Ministry of Education, Science and Culture.