File size: 1,418 Bytes
7f8b2ec 6efd1d3 7f8b2ec 457412c 6efd1d3 457412c a85db43 6efd1d3 a3796fe 36ed940 8384be4 a3796fe 6efd1d3 b050338 457412c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
license: cc-by-nc-sa-4.0
language:
- uk
tags:
- automatic-speech-recognition
---
## Community
- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
- Natural Language Processing: https://t.me/nlp_uk
## Overview
Different KenLM models for Ukrainian.
## Metrics
Tested with an acoustic model from [w2v-xls-r-uk](https://huggingface.co/Yehor/w2v-xls-r-uk) model:
| Model | CER | WER |
|-|-|-|
| no LM | 0.0412 | 0.2206 |
| lm-3gram-10k (alpha=0.1) | 0.0398 | 0.2191 |
| lm-4gram-10k (alpha=0.1) | 0.0398 | 0.219 |
| lm-5gram-10k (alpha=0.1) | 0.0398 | 0.219 |
| lm-3gram-30k | 0.038 | 0.2023 |
| lm-4gram-30k | 0.0379 | 0.2018 |
| lm-5gram-30k | 0.0379 | 0.202 |
| lm-3gram-50k | 0.0348 | 0.1826 |
| lm-4gram-50k | 0.0347 | 0.1818 |
| lm-5gram-50k | 0.0347 | 0.1821 |
| lm-3gram-100k | 0.031 | 0.1588 |
| lm-4gram-100k | 0.0308 | 0.1579 |
| lm-5gram-100k | 0.0308 | 0.1579 |
| lm-3gram-300k | 0.0261 | 0.1294 |
| lm-4gram-300k | 0.0261 | 0.1293 |
| lm-5gram-300k | 0.0261 | 0.1293 |
| lm-3gram-500k | 0.0248 | 0.1209 |
| lm-4gram-500k | 0.0247 | 0.1207 |
| lm-5gram-500k | 0.0247 | 0.1209 |
Files of the KenLM models are under the Files and versions section.
## Attribution
- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4
|