Update README.md
Browse files
README.md
CHANGED
@@ -90,16 +90,6 @@ The tokenizer for this models was built using the text transcripts of the train
|
|
90 |
|
91 |
Model is trained on Mozilla Common Voice Corpus 10.0 dataset comprising of 69 hours of Ukrainian speech.
|
92 |
|
93 |
-
## Performance
|
94 |
-
|
95 |
-
The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
|
96 |
-
|
97 |
-
| Version | Tokenizer | Vocabulary Size | MCV-8 test | MCV-8 dev | MCV-9 test | MCV-9 dev | MCV-10 test | MCV-10 dev | Train Dataset |
|
98 |
-
|---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|---------|
|
99 |
-
| 1.0.0 | SentencePiece Unigram | 1024 | null | null | null | null | null | MCV-10 validated |
|
100 |
-
|
101 |
-
While deploying with [NVIDIA Riva](https://developer.nvidia.com/riva), you can combine this model with external language models to further improve WER. The WER(%) of the latest model with different language modeling techniques are reported in the following table.
|
102 |
-
|
103 |
## Limitations
|
104 |
|
105 |
Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech that includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
|
|
90 |
|
91 |
Model is trained on Mozilla Common Voice Corpus 10.0 dataset comprising of 69 hours of Ukrainian speech.
|
92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
## Limitations
|
94 |
|
95 |
Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech that includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|