Update the model
#6
by
igitman
- opened
- README.md +14 -13
- stt_it_fastconformer_hybrid_large_pc.nemo +1 -1
README.md
CHANGED
@@ -35,7 +35,7 @@ model-index:
|
|
35 |
metrics:
|
36 |
- name: Test WER
|
37 |
type: wer
|
38 |
-
value: 5.
|
39 |
- task:
|
40 |
type: Automatic Speech Recognition
|
41 |
name: automatic-speech-recognition
|
@@ -49,7 +49,7 @@ model-index:
|
|
49 |
metrics:
|
50 |
- name: Test WER
|
51 |
type: wer
|
52 |
-
value: 11.
|
53 |
- task:
|
54 |
type: Automatic Speech Recognition
|
55 |
name: speech-recognition
|
@@ -63,7 +63,7 @@ model-index:
|
|
63 |
metrics:
|
64 |
- name: Test WER
|
65 |
type: wer
|
66 |
-
value: 16.
|
67 |
- task:
|
68 |
type: Automatic Speech Recognition
|
69 |
name: speech-recognition
|
@@ -77,7 +77,7 @@ model-index:
|
|
77 |
metrics:
|
78 |
- name: Test WER P&C
|
79 |
type: wer
|
80 |
-
value: 8.
|
81 |
- task:
|
82 |
type: Automatic Speech Recognition
|
83 |
name: automatic-speech-recognition
|
@@ -91,7 +91,7 @@ model-index:
|
|
91 |
metrics:
|
92 |
- name: Test WER P&C
|
93 |
type: wer
|
94 |
-
value:
|
95 |
- task:
|
96 |
type: Automatic Speech Recognition
|
97 |
name: speech-recognition
|
@@ -105,7 +105,7 @@ model-index:
|
|
105 |
metrics:
|
106 |
- name: Test WER P&C
|
107 |
type: wer
|
108 |
-
value: 19.
|
109 |
---
|
110 |
# NVIDIA FastConformer-Hybrid Large (it)
|
111 |
|
@@ -191,9 +191,9 @@ The tokenizers for these models were built using the text transcripts of the tra
|
|
191 |
|
192 |
The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
|
193 |
|
194 |
-
- Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning
|
195 |
-
- Multilingual LibriSpeech (Italian) - 214 hours after data cleaning
|
196 |
-
- VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning
|
197 |
|
198 |
## Performance
|
199 |
|
@@ -206,15 +206,16 @@ a) On data without Punctuation and Capitalization
|
|
206 |
|
207 |
| Version | Tokenizer | Vocabulary Size | MCV 12.0 Dev | MCV 12.0 Test | MLS Dev | MLS Test | VoxPopuli Dev | VoxPopuli Test |
|
208 |
|---------|-----------------------|-----------------|--------------|---------------|---------|----------|---------------|----------------|
|
209 |
-
| 1.20.0 | SentencePiece BPE | 512 | 5.
|
210 |
|
211 |
|
212 |
b) On data with Punctuation and Capitalization
|
213 |
|
214 |
-
| Version | Tokenizer | Vocabulary Size | MCV 12.0 Dev | MCV 12.0 Test | MLS Dev | MLS Test | VoxPopuli Dev | VoxPopuli Test |
|
215 |
-
|
216 |
-
| 1.20.0 | SentencePiece BPE | 512 | 7.
|
217 |
|
|
|
218 |
|
219 |
## Limitations
|
220 |
Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech. The model only outputs the punctuations: ```'.', ',', '?' ``` and hence might not do well in scenarios where other punctuations are also expected.
|
|
|
35 |
metrics:
|
36 |
- name: Test WER
|
37 |
type: wer
|
38 |
+
value: 5.64
|
39 |
- task:
|
40 |
type: Automatic Speech Recognition
|
41 |
name: automatic-speech-recognition
|
|
|
49 |
metrics:
|
50 |
- name: Test WER
|
51 |
type: wer
|
52 |
+
value: 11.39
|
53 |
- task:
|
54 |
type: Automatic Speech Recognition
|
55 |
name: speech-recognition
|
|
|
63 |
metrics:
|
64 |
- name: Test WER
|
65 |
type: wer
|
66 |
+
value: 16.22
|
67 |
- task:
|
68 |
type: Automatic Speech Recognition
|
69 |
name: speech-recognition
|
|
|
77 |
metrics:
|
78 |
- name: Test WER P&C
|
79 |
type: wer
|
80 |
+
value: 8.11
|
81 |
- task:
|
82 |
type: Automatic Speech Recognition
|
83 |
name: automatic-speech-recognition
|
|
|
91 |
metrics:
|
92 |
- name: Test WER P&C
|
93 |
type: wer
|
94 |
+
value: 18.27
|
95 |
- task:
|
96 |
type: Automatic Speech Recognition
|
97 |
name: speech-recognition
|
|
|
105 |
metrics:
|
106 |
- name: Test WER P&C
|
107 |
type: wer
|
108 |
+
value: 19.97
|
109 |
---
|
110 |
# NVIDIA FastConformer-Hybrid Large (it)
|
111 |
|
|
|
191 |
|
192 |
The model in this collection are trained on a composite dataset (NeMo PnC IT ASRSET) comprising of 487 hours of Italian speech:
|
193 |
|
194 |
+
- Mozilla Common Voice 12.0 (Italian) - 220 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mcv/config.yaml).
|
195 |
+
- Multilingual LibriSpeech (Italian) - 214 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/mls/config.yaml).
|
196 |
+
- VoxPopuli transcribed subset (Italian) - 53 hours after data cleaning. [Speech Data Processor](https://github.com/NVIDIA/NeMo-speech-data-processor) config used to prepare this data is [here](https://github.com/NVIDIA/NeMo-speech-data-processor/blob/main/dataset_configs/italian/voxpopuli/config.yaml).
|
197 |
|
198 |
## Performance
|
199 |
|
|
|
206 |
|
207 |
| Version | Tokenizer | Vocabulary Size | MCV 12.0 Dev | MCV 12.0 Test | MLS Dev | MLS Test | VoxPopuli Dev | VoxPopuli Test |
|
208 |
|---------|-----------------------|-----------------|--------------|---------------|---------|----------|---------------|----------------|
|
209 |
+
| 1.20.0 | SentencePiece BPE | 512 | 5.19% | 5.64% | 13.01% | 11.39% | 13.02% | 16.22% |
|
210 |
|
211 |
|
212 |
b) On data with Punctuation and Capitalization
|
213 |
|
214 |
+
| Version | Tokenizer | Vocabulary Size | MCV 12.0 Dev | MCV 12.0 Test | MLS Dev\* | MLS Test\* | VoxPopuli Dev | VoxPopuli Test |
|
215 |
+
|---------|-----------------------|-----------------|--------------|---------------|-----------|------------|---------------|----------------|
|
216 |
+
| 1.20.0 | SentencePiece BPE | 512 | 7.70% | 8.11% | 21.69% | 18.27% | 16.96% | 19.97% |
|
217 |
|
218 |
+
\* We use only a subset of dev/test sets with P&C restored from the original books
|
219 |
|
220 |
## Limitations
|
221 |
Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech. The model only outputs the punctuations: ```'.', ',', '?' ``` and hence might not do well in scenarios where other punctuations are also expected.
|
stt_it_fastconformer_hybrid_large_pc.nemo
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 455505920
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6db62aeda2dd05fe99e827f734e3b94f73b59f69f2a012e46668451f292baecb
|
3 |
size 455505920
|