Update README.md
Browse files
README.md
CHANGED
@@ -45,11 +45,11 @@ model-index:
|
|
45 |
Fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on `mozilla-foundation/common_voice_8_0 be` dataset.
|
46 |
|
47 |
`Train`, `Dev`, `Test` splits were used as they are present in the dataset. No additional data was used from `Validated` split,
|
48 |
-
only 1 voicing of each sentence
|
49 |
-
To build a better model one can use additional voicings from `Validated` split for sentences already present in `Train`, `Dev`, `Test` splits,
|
50 |
-
i.e. enlarge mentioned
|
51 |
|
52 |
Language model was built using [KenLM](https://kheafield.com/code/kenlm/estimation/).
|
53 |
-
5-gram Language model was
|
54 |
|
55 |
Source code is available [here](https://github.com/yks72p/stt_be).
|
|
|
45 |
Fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on `mozilla-foundation/common_voice_8_0 be` dataset.
|
46 |
|
47 |
`Train`, `Dev`, `Test` splits were used as they are present in the dataset. No additional data was used from `Validated` split,
|
48 |
+
only 1 voicing of each sentence was used - the way the data was split by [CommonVoice CorporaCreator](https://github.com/common-voice/CorporaCreator).
|
49 |
+
To build a better model **one can use additional voicings from `Validated` split** for sentences already present in `Train`, `Dev`, `Test` splits,
|
50 |
+
i.e. enlarge mentioned splits.
|
51 |
|
52 |
Language model was built using [KenLM](https://kheafield.com/code/kenlm/estimation/).
|
53 |
+
5-gram Language model was built on sentences from `Train + (Other - Dev - Test)` splits of `mozilla-foundation/common_voice_8_0 be` dataset.
|
54 |
|
55 |
Source code is available [here](https://github.com/yks72p/stt_be).
|