alokmatta commited on
Commit
84c4b32
1 Parent(s): a4cf49b

Initial commit

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ language: {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
2
+ datasets:
3
+ - common_voice #TODO: remove if you did not use the common voice dataset
4
+ - TODO: add more datasets if you have used additional datasets. Make sure to use the exact same
5
+ dataset name as the one found [here](https://huggingface.co/datasets). If the dataset can not be found in the official datasets, just give it a new name
6
+ metrics:
7
+ - wer
8
+ tags:
9
+ - audio
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - xlsr-fine-tuning-week
13
+ license: apache-2.0
14
+ model-index:
15
+ - name: {human_readable_name} #TODO: replace {human_readable_name} with a name of your model as it should appear on the leaderboard. It could be something like `Elgeish XLSR Wav2Vec2 Large 53`
16
+ results:
17
+ - task:
18
+ name: Speech Recognition
19
+ type: automatic-speech-recognition
20
+ dataset:
21
+ name: Common Voice {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
22
+ type: common_voice
23
+ args: {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
24
+ metrics:
25
+ - name: Test WER
26
+ type: wer
27
+ value: {wer_result_on_test} #TODO (IMPORTANT): replace {wer_result_on_test} with the WER error rate you achieved on the common_voice test set. It should be in the format XX.XX (don't add the % sign here). **Please** remember to fill out this value after you evaluated your model, so that your model appears on the leaderboard. If you fill out this model card before evaluating your model, please remember to edit the model card afterward to fill in your value
28
+ ---
29
+
30
+ # Wav2Vec2-Large-XLSR-53-{language} #TODO: replace language with your {language}, *e.g.* French
31
+
32
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on {language} using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset{s}. #TODO: replace {language} with your language, *e.g.* French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice
33
+ When using this model, make sure that your speech input is sampled at 16kHz.
34
+
35
+ ## Usage
36
+
37
+ The model can be used directly (without a language model) as follows:
38
+
39
+ ```python
40
+ import torch
41
+ import torchaudio
42
+ from datasets import load_dataset
43
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
44
+
45
+ test_dataset = load_dataset("common_voice", "{lang_id}", split="test[:2%]") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
46
+
47
+ processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
48
+ model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
49
+
50
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
51
+
52
+ # Preprocessing the datasets.
53
+ # We need to read the aduio files as arrays
54
+ def speech_file_to_array_fn(batch):
55
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
56
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
57
+ return batch
58
+
59
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
60
+ inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
61
+
62
+ with torch.no_grad():
63
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
64
+
65
+ predicted_ids = torch.argmax(logits, dim=-1)
66
+
67
+ print("Prediction:", processor.batch_decode(predicted_ids))
68
+ print("Reference:", test_dataset["sentence"][:2])
69
+ ```
70
+
71
+
72
+ ## Evaluation
73
+
74
+ The model can be evaluated as follows on the {language} test data of Common Voice. # TODO: replace #TODO: replace language with your {language}, *e.g.* French
75
+
76
+
77
+ ```python
78
+ import torch
79
+ import torchaudio
80
+ from datasets import load_dataset, load_metric
81
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
82
+ import re
83
+
84
+ test_dataset = load_dataset("common_voice", "{lang_id}", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
85
+ wer = load_metric("wer")
86
+
87
+ processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
88
+ model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
89
+ model.to("cuda")
90
+
91
+ chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
92
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
93
+
94
+ # Preprocessing the datasets.
95
+ # We need to read the aduio files as arrays
96
+ def speech_file_to_array_fn(batch):
97
+ batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
98
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
99
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
100
+ return batch
101
+
102
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
103
+
104
+ # Preprocessing the datasets.
105
+ # We need to read the aduio files as arrays
106
+ def evaluate(batch):
107
+ inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
108
+
109
+ with torch.no_grad():
110
+ logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
111
+
112
+ pred_ids = torch.argmax(logits, dim=-1)
113
+ batch["pred_strings"] = processor.batch_decode(pred_ids)
114
+ return batch
115
+
116
+ result = test_dataset.map(evaluate, batched=True, batch_size=8)
117
+
118
+ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
119
+ ```
120
+
121
+ **Test Result**: XX.XX % # TODO: write output of print here. IMPORTANT: Please remember to also replace {wer_result_on_test} at the top of with this value here. tags.
122
+
123
+
124
+ ## Training
125
+
126
+ The Common Voice `train`, `validation`, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training.
127
+
128
+ The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
README.md CHANGED
@@ -1,6 +1,6 @@
1
- language: {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
2
  datasets:
3
- - common_voice #TODO: remove if you did not use the common voice dataset
4
  - TODO: add more datasets if you have used additional datasets. Make sure to use the exact same
5
  dataset name as the one found [here](https://huggingface.co/datasets). If the dataset can not be found in the official datasets, just give it a new name
6
  metrics:
@@ -12,7 +12,7 @@ tags:
12
  - xlsr-fine-tuning-week
13
  license: apache-2.0
14
  model-index:
15
- - name: {human_readable_name} #TODO: replace {human_readable_name} with a name of your model as it should appear on the leaderboard. It could be something like `Elgeish XLSR Wav2Vec2 Large 53`
16
  results:
17
  - task:
18
  name: Speech Recognition
1
+ language: sw
2
  datasets:
3
+ - OpenSLR - http://www.openslr.org/25/
4
  - TODO: add more datasets if you have used additional datasets. Make sure to use the exact same
5
  dataset name as the one found [here](https://huggingface.co/datasets). If the dataset can not be found in the official datasets, just give it a new name
6
  metrics:
12
  - xlsr-fine-tuning-week
13
  license: apache-2.0
14
  model-index:
15
+ - name: Swahili XLSR Wav2Vec2 Large 53
16
  results:
17
  - task:
18
  name: Speech Recognition