comodoro commited on
Commit
de49766
1 Parent(s): 4f0c63f

Add results

Browse files
Files changed (1) hide show
  1. README.md +118 -3
README.md CHANGED
@@ -1,3 +1,118 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - pl
4
+ license: apache-2.0
5
+ tags:
6
+ - automatic-speech-recognition
7
+ - mozilla-foundation/common_voice_8_0
8
+ - robust-speech-event
9
+ - xlsr-fine-tuning-week
10
+ datasets:
11
+ - common_voice
12
+ model-index:
13
+ - name: Polish comodoro Wav2Vec2 XLSR 300M CV8
14
+ results:
15
+ - task:
16
+ name: Automatic Speech Recognition
17
+ type: automatic-speech-recognition
18
+ dataset:
19
+ name: Common Voice 8
20
+ type: mozilla-foundation/common_voice_8_0
21
+ args: pl
22
+ metrics:
23
+ - name: Test WER
24
+ type: wer
25
+ value: 17.0
26
+ - name: Test CER
27
+ type: cer
28
+ value: 3.8
29
+ ---
30
+ # wav2vec2-xls-r-300m-pl-cv8
31
+
32
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset.
33
+ It achieves the following results on the evaluation set while training:
34
+ - Loss: 0.1716
35
+ - Wer: 0.1697
36
+ - Cer: 0.0385
37
+
38
+ The `eval.py` script results are:
39
+ WER: 0.16970531733661967
40
+ CER: 0.03839135416519316
41
+
42
+ ## Model description
43
+
44
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Polish using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
45
+ When using this model, make sure that your speech input is sampled at 16kHz.
46
+
47
+
48
+ The model can be used directly (without a language model) as follows:
49
+
50
+ ```python
51
+ import torch
52
+ import torchaudio
53
+ from datasets import load_dataset
54
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
55
+
56
+ test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "pl", split="test[:2%]")
57
+
58
+ processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-pl-cv8")
59
+ model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-pl-cv8")
60
+
61
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
62
+
63
+ # Preprocessing the datasets.
64
+ # We need to read the aduio files as arrays
65
+ def speech_file_to_array_fn(batch):
66
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
67
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
68
+ return batch
69
+
70
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
71
+ inputs = processor(test_dataset[:2]["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
72
+
73
+ with torch.no_grad():
74
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
75
+
76
+ predicted_ids = torch.argmax(logits, dim=-1)
77
+
78
+ print("Prediction:", processor.batch_decode(predicted_ids))
79
+ print("Reference:", test_dataset[:2]["sentence"])
80
+ ```
81
+
82
+ ## Evaluation
83
+
84
+ The model can be evaluated using the attached `eval.py` script:
85
+ ```
86
+ python eval.py --model_id comodoro/wav2vec2-xls-r-300m-pl-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config pl
87
+ ```
88
+
89
+ ## Training and evaluation data
90
+
91
+ The Common Voice 8.0 `train` and `validation` datasets were used for training
92
+
93
+ ## Training procedure
94
+
95
+ ### Training hyperparameters
96
+
97
+ The following hyperparameters were used:
98
+
99
+ - learning_rate: 1e-4
100
+ - train_batch_size: 32
101
+ - eval_batch_size: 8
102
+ - seed: 42
103
+ - gradient_accumulation_steps: 1
104
+ - total_train_batch_size: 640
105
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
106
+ - lr_scheduler_type: linear
107
+ - lr_scheduler_warmup_steps: 500
108
+ - num_epochs: 150
109
+ - mixed_precision_training: Native AMP
110
+
111
+ The training was interrupted after 3250 steps.
112
+
113
+ ### Framework versions
114
+
115
+ - Transformers 4.16.0.dev0
116
+ - Pytorch 1.10.1+cu102
117
+ - Datasets 1.17.1.dev0
118
+ - Tokenizers 0.11.0