skylord commited on
Commit
287140f
1 Parent(s): 125194e

Updated readme for wer

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: el
3
+ datasets:
4
+ - common_voice
5
+ metrics:
6
+ - wer
7
+ tags:
8
+ - audio
9
+ - automatic-speech-recognition
10
+ - speech
11
+ - xlsr-fine-tuning-week
12
+ license: apache-2.0
13
+ model-index:
14
+ - name: Greek XLSR Wav2Vec2 Large 53
15
+ results:
16
+ - task:
17
+ name: Speech Recognition
18
+ type: automatic-speech-recognition
19
+ dataset:
20
+ name: Common Voice el
21
+ type: common_voice
22
+ args: el
23
+ metrics:
24
+ - name: Test WER
25
+ type: wer
26
+ value: 34.006258
27
+ ---
28
+
29
+ # Wav2Vec2-Large-XLSR-53-Greek
30
+
31
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Greek using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset{s}. #TODO: replace {language} with your language, *e.g.* French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice
32
+ When using this model, make sure that your speech input is sampled at 16kHz.
33
+
34
+ ## Usage
35
+
36
+ The model can be used directly (without a language model) as follows:
37
+
38
+ ```python
39
+ import torch
40
+ import torchaudio
41
+ from datasets import load_dataset
42
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
43
+ test_dataset = load_dataset("common_voice", "el", split="test[:2%]")
44
+
45
+ processor = Wav2Vec2Processor.from_pretrained("skylord/greek_lsr_1")
46
+ model = Wav2Vec2ForCTC.from_pretrained("skylord/greek_lsr_1")
47
+
48
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
49
+
50
+ # Preprocessing the datasets.
51
+ # We need to read the aduio files as arrays
52
+ def speech_file_to_array_fn(batch):
53
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
54
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
55
+ return batch
56
+
57
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
58
+ inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
59
+
60
+ with torch.no_grad():
61
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
62
+
63
+ predicted_ids = torch.argmax(logits, dim=-1)
64
+ print("Prediction:", processor.batch_decode(predicted_ids))
65
+ print("Reference:", test_dataset["sentence"][:2])
66
+ ```
67
+
68
+
69
+ ## Evaluation
70
+
71
+ The model can be evaluated as follows on the Greek test data of Common Voice.
72
+
73
+
74
+ ```python
75
+ import torch
76
+ import torchaudio
77
+ from datasets import load_dataset, load_metric
78
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
79
+ import re
80
+
81
+ test_dataset = load_dataset("common_voice", "el", split="test")
82
+ wer = load_metric("wer")
83
+
84
+ processor = Wav2Vec2Processor.from_pretrained("skylord/greek_lsr_1")
85
+ model = Wav2Vec2ForCTC.from_pretrained("skylord/greek_lsr_1")
86
+ model.to("cuda")
87
+
88
+ chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“]'
89
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
90
+
91
+ # Preprocessing the datasets.
92
+ # We need to read the aduio files as arrays
93
+
94
+ def speech_file_to_array_fn(batch):
95
+ batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
96
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
97
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
98
+ return batch
99
+
100
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
101
+
102
+ # Preprocessing the datasets.
103
+ # We need to read the aduio files as arrays
104
+
105
+ def evaluate(batch):
106
+ inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
107
+ with torch.no_grad():
108
+ logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
109
+ pred_ids = torch.argmax(logits, dim=-1)
110
+ batch["pred_strings"] = processor.batch_decode(pred_ids)
111
+ return batch
112
+
113
+ result = test_dataset.map(evaluate, batched=True, batch_size=8)
114
+ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
115
+ ```
116
+
117
+ **Test Result**: 34.006258 %
118
+
119
+
120
+ ## Training
121
+
122
+ The Common Voice `train`, `validation`, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training.
123
+
124
+ The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
README.md CHANGED
@@ -23,7 +23,7 @@ results:
23
  metrics:
24
  - name: Test WER
25
  type: wer
26
- value: 56.253154
27
  ---
28
 
29
  # Wav2Vec2-Large-XLSR-53-Greek
@@ -114,11 +114,11 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
114
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
115
  ```
116
 
117
- **Test Result**: 56.253154 %
118
 
119
 
120
  ## Training
121
 
122
- The Common Voice `train`, `validation`, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training.
123
 
124
  The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
23
  metrics:
24
  - name: Test WER
25
  type: wer
26
+ value: 34.006258
27
  ---
28
 
29
  # Wav2Vec2-Large-XLSR-53-Greek
114
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
115
  ```
116
 
117
+ **Test Result**: 34.006258 %
118
 
119
 
120
  ## Training
121
 
122
+ The Common Voice `train`, `validation`, datasets were used for training as well as
123
 
124
  The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.