waveletdeboshir
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -52,9 +52,9 @@ Model was finetuned on russian part of [mozilla-foundation/common_voice_15_0](ht
|
|
52 |
|
53 |
| metric | dataset | waveletdeboshir/whisper-base-ru-pruned | waveletdeboshir/whisper-small-ru-pruned-finetuned |
|
54 |
| :------ | :------ | :------ | :------ |
|
55 |
-
| WER
|
56 |
| WER | common_voice_15_0_test | | |
|
57 |
-
|
58 |
|
59 |
## Size
|
60 |
Only 10% tokens was left including special whisper tokens (no language tokens except \<|ru|\> and \<|en|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.
|
|
|
52 |
|
53 |
| metric | dataset | waveletdeboshir/whisper-base-ru-pruned | waveletdeboshir/whisper-small-ru-pruned-finetuned |
|
54 |
| :------ | :------ | :------ | :------ |
|
55 |
+
| WER (without punctuation) | common_voice_15_0_test | | |
|
56 |
| WER | common_voice_15_0_test | | |
|
57 |
+
|
58 |
|
59 |
## Size
|
60 |
Only 10% tokens was left including special whisper tokens (no language tokens except \<|ru|\> and \<|en|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.
|