File size: 1,418 Bytes
7f8b2ec
 
6efd1d3
 
 
 
7f8b2ec
457412c
6efd1d3
457412c
a85db43
6efd1d3
 
 
 
 
 
 
 
 
 
 
a3796fe
 
 
 
36ed940
 
 
8384be4
 
 
a3796fe
 
 
 
 
 
 
 
 
 
 
 
 
6efd1d3
 
 
b050338
457412c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: cc-by-nc-sa-4.0
language: 
  - uk
tags:
- automatic-speech-recognition
---

## Community

- Discord: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
- Natural Language Processing: https://t.me/nlp_uk

## Overview

Different KenLM models for Ukrainian.

## Metrics

Tested with an acoustic model from [w2v-xls-r-uk](https://huggingface.co/Yehor/w2v-xls-r-uk) model:

| Model | CER | WER |
|-|-|-|
| no LM | 0.0412 | 0.2206 |
| lm-3gram-10k (alpha=0.1) |  0.0398 |  0.2191 |
| lm-4gram-10k (alpha=0.1) |  0.0398 | 0.219 |
| lm-5gram-10k (alpha=0.1) |  0.0398 | 0.219 |
| lm-3gram-30k |   0.038 |  0.2023 |
| lm-4gram-30k |   0.0379 | 0.2018 |
| lm-5gram-30k |  0.0379 | 0.202 |
| lm-3gram-50k |  0.0348 |  0.1826 |
| lm-4gram-50k |  0.0347 | 0.1818 |
| lm-5gram-50k |  0.0347 | 0.1821 |
| lm-3gram-100k |  0.031 | 0.1588 |
| lm-4gram-100k |  0.0308 | 0.1579 |
| lm-5gram-100k |  0.0308 | 0.1579 |
| lm-3gram-300k |  0.0261 | 0.1294 |
| lm-4gram-300k |  0.0261 | 0.1293 |
| lm-5gram-300k |  0.0261 | 0.1293 |
| lm-3gram-500k |  0.0248 | 0.1209 |
| lm-4gram-500k |  0.0247 | 0.1207 |
| lm-5gram-500k |  0.0247 | 0.1209 |

Files of the KenLM models are under the Files and versions section.

## Attribution

- Chaplynskyi, D. et al. (2021) lang-uk Ukrainian Ubercorpus [Data set]. https://lang.org.ua/uk/corpora/#anchor4