jonatasgrosman
commited on
Commit
•
9938d89
1
Parent(s):
7fee82d
update model
Browse files- README.md +14 -72
- config.json +1 -1
- preprocessor_config.json +1 -0
- pytorch_model.bin +1 -1
README.md
CHANGED
@@ -24,10 +24,10 @@ model-index:
|
|
24 |
metrics:
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
-
value: 21.
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
-
value: 9.
|
31 |
---
|
32 |
|
33 |
# Wav2vec2-Large-English
|
@@ -81,16 +81,16 @@ for i, predicted_sentence in enumerate(predicted_sentences):
|
|
81 |
|
82 |
| Reference | Prediction |
|
83 |
| ------------- | ------------- |
|
84 |
-
| "SHE'LL BE ALL RIGHT." |
|
85 |
| SIX | SIX |
|
86 |
-
| "ALL'S WELL THAT ENDS WELL." |
|
87 |
-
| DO YOU MEAN IT? |
|
88 |
-
| THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE, BUT STILL CAUSES REGRESSIONS. | THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE BUT STILL CAUSES
|
89 |
-
| HOW IS MOZILLA GOING TO HANDLE AMBIGUITIES LIKE QUEUE AND CUE? | HOW IS
|
90 |
-
| "I GUESS YOU MUST THINK I'M KINDA BATTY." |
|
91 |
| NO ONE NEAR THE REMOTE MACHINE YOU COULD RING? | NO ONE NEAR THE REMOTE MACHINE YOU COULD RING |
|
92 |
-
| SAUCE FOR THE GOOSE IS SAUCE FOR THE GANDER. | SAUCE FOR THE
|
93 |
-
| GROVES STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD. |
|
94 |
|
95 |
## Evaluation
|
96 |
|
@@ -159,76 +159,18 @@ print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_
|
|
159 |
|
160 |
**Test Result**:
|
161 |
|
162 |
-
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-
|
163 |
-
|
164 |
-
---
|
165 |
-
|
166 |
-
**Common Voice**
|
167 |
|
168 |
| Model | WER | CER |
|
169 |
| ------------- | ------------- | ------------- |
|
170 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-english | **
|
171 |
-
| jonatasgrosman/wav2vec2-large-english | 21.
|
172 |
| facebook/wav2vec2-large-960h-lv60-self | 22.03% | 10.39% |
|
173 |
| facebook/wav2vec2-large-960h-lv60 | 23.97% | 11.14% |
|
|
|
174 |
| facebook/wav2vec2-large-960h | 32.79% | 16.03% |
|
175 |
-
| boris/xlsr-en-punctuation | 34.81% | 15.51% |
|
176 |
| facebook/wav2vec2-base-960h | 39.86% | 19.89% |
|
177 |
| facebook/wav2vec2-base-100h | 51.06% | 25.06% |
|
178 |
| elgeish/wav2vec2-large-lv60-timit-asr | 59.96% | 34.28% |
|
179 |
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 66.41% | 36.76% |
|
180 |
| elgeish/wav2vec2-base-timit-asr | 68.78% | 36.81% |
|
181 |
-
|
182 |
-
---
|
183 |
-
|
184 |
-
**LibriSpeech (clean)**
|
185 |
-
|
186 |
-
| Model | WER | CER |
|
187 |
-
| ------------- | ------------- | ------------- |
|
188 |
-
| facebook/wav2vec2-large-960h-lv60-self | **1.86%** | **0.54%** |
|
189 |
-
| facebook/wav2vec2-large-960h-lv60 | 2.15% | 0.61% |
|
190 |
-
| facebook/wav2vec2-large-960h | 2.82% | 0.84% |
|
191 |
-
| facebook/wav2vec2-base-960h | 3.44% | 1.06% |
|
192 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-english | 4.16% | 1.28% |
|
193 |
-
| facebook/wav2vec2-base-100h | 6.26% | 2.00% |
|
194 |
-
| jonatasgrosman/wav2vec2-large-english | 8.00% | 2.55% |
|
195 |
-
| elgeish/wav2vec2-large-lv60-timit-asr | 15.53% | 4.93% |
|
196 |
-
| boris/xlsr-en-punctuation | 19.28% | 6.45% |
|
197 |
-
| elgeish/wav2vec2-base-timit-asr | 29.19% | 8.38% |
|
198 |
-
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 31.82% | 12.41% |
|
199 |
-
|
200 |
-
---
|
201 |
-
|
202 |
-
**LibriSpeech (other)**
|
203 |
-
|
204 |
-
| Model | WER | CER |
|
205 |
-
| ------------- | ------------- | ------------- |
|
206 |
-
| facebook/wav2vec2-large-960h-lv60-self | **3.89%** | **1.40%** |
|
207 |
-
| facebook/wav2vec2-large-960h-lv60 | 4.45% | 1.56% |
|
208 |
-
| facebook/wav2vec2-large-960h | 6.49% | 2.52% |
|
209 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-english | 8.82% | 3.42% |
|
210 |
-
| facebook/wav2vec2-base-960h | 8.90% | 3.55% |
|
211 |
-
| jonatasgrosman/wav2vec2-large-english | 13.62% | 5.24% |
|
212 |
-
| facebook/wav2vec2-base-100h | 13.97% | 5.51% |
|
213 |
-
| boris/xlsr-en-punctuation | 26.40% | 10.11% |
|
214 |
-
| elgeish/wav2vec2-large-lv60-timit-asr | 28.39% | 12.08% |
|
215 |
-
| elgeish/wav2vec2-base-timit-asr | 42.04% | 15.57% |
|
216 |
-
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 45.19% | 20.32% |
|
217 |
-
|
218 |
-
---
|
219 |
-
|
220 |
-
**TIMIT**
|
221 |
-
|
222 |
-
| Model | WER | CER |
|
223 |
-
| ------------- | ------------- | ------------- |
|
224 |
-
| facebook/wav2vec2-large-960h-lv60-self | **5.17%** | **1.33%** |
|
225 |
-
| facebook/wav2vec2-large-960h-lv60 | 6.24% | 1.54% |
|
226 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-english | 6.81% | 2.02% |
|
227 |
-
| facebook/wav2vec2-large-960h | 9.63% | 2.19% |
|
228 |
-
| facebook/wav2vec2-base-960h | 11.48% | 2.76% |
|
229 |
-
| elgeish/wav2vec2-large-lv60-timit-asr | 13.83% | 4.36% |
|
230 |
-
| jonatasgrosman/wav2vec2-large-english | 13.91% | 4.01% |
|
231 |
-
| facebook/wav2vec2-base-100h | 16.75% | 4.79% |
|
232 |
-
| elgeish/wav2vec2-base-timit-asr | 25.40% | 8.16% |
|
233 |
-
| boris/xlsr-en-punctuation | 25.93% | 9.99% |
|
234 |
-
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 51.08% | 19.84% |
|
24 |
metrics:
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
+
value: 21.53
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
+
value: 9.66
|
31 |
---
|
32 |
|
33 |
# Wav2vec2-Large-English
|
81 |
|
82 |
| Reference | Prediction |
|
83 |
| ------------- | ------------- |
|
84 |
+
| "SHE'LL BE ALL RIGHT." | SHELL BE ALL RIGHT |
|
85 |
| SIX | SIX |
|
86 |
+
| "ALL'S WELL THAT ENDS WELL." | ALLAS WELL THAT ENDS WELL |
|
87 |
+
| DO YOU MEAN IT? | W MEAN IT |
|
88 |
+
| THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE, BUT STILL CAUSES REGRESSIONS. | THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE BUT STILL CAUSES REGRESTION |
|
89 |
+
| HOW IS MOZILLA GOING TO HANDLE AMBIGUITIES LIKE QUEUE AND CUE? | HOW IS MOSILLA GOING TO BANDL AND BE WHIT IS LIKE QU AND QU |
|
90 |
+
| "I GUESS YOU MUST THINK I'M KINDA BATTY." | RUSTION AS HAME AK AN THE POT |
|
91 |
| NO ONE NEAR THE REMOTE MACHINE YOU COULD RING? | NO ONE NEAR THE REMOTE MACHINE YOU COULD RING |
|
92 |
+
| SAUCE FOR THE GOOSE IS SAUCE FOR THE GANDER. | SAUCE FOR THE GUCE IS SAUCE FOR THE GONDER |
|
93 |
+
| GROVES STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD. | GRAFS STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD |
|
94 |
|
95 |
## Evaluation
|
96 |
|
159 |
|
160 |
**Test Result**:
|
161 |
|
162 |
+
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-06-17). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
|
|
|
|
|
|
|
|
|
163 |
|
164 |
| Model | WER | CER |
|
165 |
| ------------- | ------------- | ------------- |
|
166 |
+
| jonatasgrosman/wav2vec2-large-xlsr-53-english | **18.98%** | **8.29%** |
|
167 |
+
| jonatasgrosman/wav2vec2-large-english | 21.53% | 9.66% |
|
168 |
| facebook/wav2vec2-large-960h-lv60-self | 22.03% | 10.39% |
|
169 |
| facebook/wav2vec2-large-960h-lv60 | 23.97% | 11.14% |
|
170 |
+
| boris/xlsr-en-punctuation | 29.10% | 10.75% |
|
171 |
| facebook/wav2vec2-large-960h | 32.79% | 16.03% |
|
|
|
172 |
| facebook/wav2vec2-base-960h | 39.86% | 19.89% |
|
173 |
| facebook/wav2vec2-base-100h | 51.06% | 25.06% |
|
174 |
| elgeish/wav2vec2-large-lv60-timit-asr | 59.96% | 34.28% |
|
175 |
| facebook/wav2vec2-base-10k-voxpopuli-ft-en | 66.41% | 36.76% |
|
176 |
| elgeish/wav2vec2-base-timit-asr | 68.78% | 36.81% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -64,6 +64,6 @@
|
|
64 |
"num_feat_extract_layers": 7,
|
65 |
"num_hidden_layers": 24,
|
66 |
"pad_token_id": 0,
|
67 |
-
"transformers_version": "4.
|
68 |
"vocab_size": 33
|
69 |
}
|
64 |
"num_feat_extract_layers": 7,
|
65 |
"num_hidden_layers": 24,
|
66 |
"pad_token_id": 0,
|
67 |
+
"transformers_version": "4.7.0.dev0",
|
68 |
"vocab_size": 33
|
69 |
}
|
preprocessor_config.json
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
{
|
2 |
"do_normalize": true,
|
|
|
3 |
"feature_size": 1,
|
4 |
"padding_side": "right",
|
5 |
"padding_value": 0.0,
|
1 |
{
|
2 |
"do_normalize": true,
|
3 |
+
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
|
4 |
"feature_size": 1,
|
5 |
"padding_side": "right",
|
6 |
"padding_value": 0.0,
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1262022892
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ef32d54a3ebc911d64e7a8b8d04896f0957bbab4e3da4c6c0ae42ab901d6e4e7
|
3 |
size 1262022892
|