Librarian Bot: Add base_model information to model

#1
Files changed (1) hide show
  1. README.md +90 -89
README.md CHANGED
@@ -10,122 +10,123 @@ tags:
10
  - hf-asr-leaderboard
11
  datasets:
12
  - common_voice
 
13
  model-index:
14
  - name: Slovak comodoro Wav2Vec2 XLSR 300M CV8
15
  results:
16
  - task:
17
- name: Automatic Speech Recognition
18
  type: automatic-speech-recognition
 
19
  dataset:
20
  name: Common Voice 8
21
  type: mozilla-foundation/common_voice_8_0
22
  args: sk
23
  metrics:
24
- - name: Test WER
25
- type: wer
26
  value: 49.6
27
- - name: Test CER
28
- type: cer
29
  value: 13.3
 
30
  - task:
31
- name: Automatic Speech Recognition
32
  type: automatic-speech-recognition
 
33
  dataset:
34
  name: Robust Speech Event - Dev Data
35
  type: speech-recognition-community-v2/dev_data
36
  args: sk
37
  metrics:
38
- - name: Test WER
39
- type: wer
40
  value: 81.7
 
41
  - task:
42
- name: Automatic Speech Recognition
43
  type: automatic-speech-recognition
 
44
  dataset:
45
  name: Robust Speech Event - Test Data
46
  type: speech-recognition-community-v2/eval_data
47
  args: sk
48
  metrics:
49
- - name: Test WER
50
- type: wer
51
  value: 80.26
 
52
  ---
53
 
54
-
55
- # wav2vec2-xls-r-300m-cs-cv8
56
-
57
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset.
58
- It achieves the following results on the evaluation set:
59
-
60
- - WER: 0.49575384615384616
61
- - CER: 0.13333333333333333
62
-
63
- ## Usage
64
-
65
- The model can be used directly (without a language model) as follows:
66
-
67
- ```python
68
- import torch
69
- import torchaudio
70
- from datasets import load_dataset
71
- from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
72
-
73
- test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "sk", split="test[:2%]")
74
-
75
- processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")
76
- model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")
77
-
78
- resampler = torchaudio.transforms.Resample(48_000, 16_000)
79
-
80
- # Preprocessing the datasets.
81
- # We need to read the aduio files as arrays
82
- def speech_file_to_array_fn(batch):
83
- speech_array, sampling_rate = torchaudio.load(batch["path"])
84
- batch["speech"] = resampler(speech_array).squeeze().numpy()
85
- return batch
86
-
87
- test_dataset = test_dataset.map(speech_file_to_array_fn)
88
- inputs = processor(test_dataset[:2]["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
89
-
90
- with torch.no_grad():
91
- logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
92
-
93
- predicted_ids = torch.argmax(logits, dim=-1)
94
-
95
- print("Prediction:", processor.batch_decode(predicted_ids))
96
- print("Reference:", test_dataset[:2]["sentence"])
97
- ```
98
-
99
- ## Evaluation
100
-
101
- The model can be evaluated using the attached `eval.py` script:
102
- ```
103
- python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sk-cv8 --dataset mozilla-foundation/common_voice_8_0 --split test --config sk
104
- ```
105
-
106
- ## Training and evaluation data
107
-
108
- The Common Voice 8.0 `train` and `validation` datasets were used for training
109
-
110
- ### Training hyperparameters
111
-
112
- The following hyperparameters were used during training:
113
-
114
- - learning_rate: 7e-4
115
- - train_batch_size: 32
116
- - eval_batch_size: 8
117
- - seed: 42
118
- - gradient_accumulation_steps: 20
119
- - total_train_batch_size: 640
120
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
121
- - lr_scheduler_type: linear
122
- - lr_scheduler_warmup_steps: 500
123
- - num_epochs: 50
124
- - mixed_precision_training: Native AMP
125
-
126
- ### Framework versions
127
-
128
- - Transformers 4.16.0.dev0
129
- - Pytorch 1.10.1+cu102
130
- - Datasets 1.17.1.dev0
131
- - Tokenizers 0.11.0
 
10
  - hf-asr-leaderboard
11
  datasets:
12
  - common_voice
13
+ base_model: facebook/wav2vec2-xls-r-300m
14
  model-index:
15
  - name: Slovak comodoro Wav2Vec2 XLSR 300M CV8
16
  results:
17
  - task:
 
18
  type: automatic-speech-recognition
19
+ name: Automatic Speech Recognition
20
  dataset:
21
  name: Common Voice 8
22
  type: mozilla-foundation/common_voice_8_0
23
  args: sk
24
  metrics:
25
+ - type: wer
 
26
  value: 49.6
27
+ name: Test WER
28
+ - type: cer
29
  value: 13.3
30
+ name: Test CER
31
  - task:
 
32
  type: automatic-speech-recognition
33
+ name: Automatic Speech Recognition
34
  dataset:
35
  name: Robust Speech Event - Dev Data
36
  type: speech-recognition-community-v2/dev_data
37
  args: sk
38
  metrics:
39
+ - type: wer
 
40
  value: 81.7
41
+ name: Test WER
42
  - task:
 
43
  type: automatic-speech-recognition
44
+ name: Automatic Speech Recognition
45
  dataset:
46
  name: Robust Speech Event - Test Data
47
  type: speech-recognition-community-v2/eval_data
48
  args: sk
49
  metrics:
50
+ - type: wer
 
51
  value: 80.26
52
+ name: Test WER
53
  ---
54
 
55
+
56
+ # wav2vec2-xls-r-300m-cs-cv8
57
+
58
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice 8.0 dataset.
59
+ It achieves the following results on the evaluation set:
60
+
61
+ - WER: 0.49575384615384616
62
+ - CER: 0.13333333333333333
63
+
64
+ ## Usage
65
+
66
+ The model can be used directly (without a language model) as follows:
67
+
68
+ ```python
69
+ import torch
70
+ import torchaudio
71
+ from datasets import load_dataset
72
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
73
+
74
+ test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "sk", split="test[:2%]")
75
+
76
+ processor = Wav2Vec2Processor.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")
77
+ model = Wav2Vec2ForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-sk-cv8")
78
+
79
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
80
+
81
+ # Preprocessing the datasets.
82
+ # We need to read the aduio files as arrays
83
+ def speech_file_to_array_fn(batch):
84
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
85
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
86
+ return batch
87
+
88
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
89
+ inputs = processor(test_dataset[:2]["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
90
+
91
+ with torch.no_grad():
92
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
93
+
94
+ predicted_ids = torch.argmax(logits, dim=-1)
95
+
96
+ print("Prediction:", processor.batch_decode(predicted_ids))
97
+ print("Reference:", test_dataset[:2]["sentence"])
98
+ ```
99
+
100
+ ## Evaluation
101
+
102
+ The model can be evaluated using the attached `eval.py` script:
103
+ ```
104
+ python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sk-cv8 --dataset mozilla-foundation/common_voice_8_0 --split test --config sk
105
+ ```
106
+
107
+ ## Training and evaluation data
108
+
109
+ The Common Voice 8.0 `train` and `validation` datasets were used for training
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+
115
+ - learning_rate: 7e-4
116
+ - train_batch_size: 32
117
+ - eval_batch_size: 8
118
+ - seed: 42
119
+ - gradient_accumulation_steps: 20
120
+ - total_train_batch_size: 640
121
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
122
+ - lr_scheduler_type: linear
123
+ - lr_scheduler_warmup_steps: 500
124
+ - num_epochs: 50
125
+ - mixed_precision_training: Native AMP
126
+
127
+ ### Framework versions
128
+
129
+ - Transformers 4.16.0.dev0
130
+ - Pytorch 1.10.1+cu102
131
+ - Datasets 1.17.1.dev0
132
+ - Tokenizers 0.11.0