Update README.md
Browse files
README.md
CHANGED
@@ -182,7 +182,13 @@ The tokenizer for this models was built using the text transcripts of the train
|
|
182 |
|
183 |
### Datasets
|
184 |
|
185 |
-
Model is trained on validated Mozilla Common Voice Corpus 10.0 dataset(excluding dev and test data) comprising of 69 hours of Ukrainian speech.
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
|
187 |
## Limitations
|
188 |
|
|
|
182 |
|
183 |
### Datasets
|
184 |
|
185 |
+
Model is trained on validated Mozilla Common Voice Corpus 10.0 dataset (excluding dev and test data) comprising of 69 hours of Ukrainian speech.
|
186 |
+
|
187 |
+
## Performance
|
188 |
+
|
189 |
+
| Version | Tokenizer | Vocabulary Size | MCV-8 test | MCV-8 dev | MCV-9 test | MCV-9 dev | MCV-10 test | MCV-10 dev |
|
190 |
+
| :-----------: |:---------------------:| :--------------: | :--------: | :-------: | :--------: | :-------: | :---------: | :--------: |
|
191 |
+
| 1.0.0 | SentencePiece Unigram | 1024 | 4.27 | 5.66 | 4.45 | 5.57 | 5.53 | 5.30 |
|
192 |
|
193 |
## Limitations
|
194 |
|