ales commited on
Commit
5dee002
·
1 Parent(s): be307cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -45,11 +45,11 @@ model-index:
45
  Fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on `mozilla-foundation/common_voice_8_0 be` dataset.
46
 
47
  `Train`, `Dev`, `Test` splits were used as they are present in the dataset. No additional data was used from `Validated` split,
48
- only 1 voicing of each sentence were used - the way the data was split by CommonVoice.
49
- To build a better model one can use additional voicings from `Validated` split for sentences already present in `Train`, `Dev`, `Test` splits,
50
- i.e. enlarge mentioned split.
51
 
52
  Language model was built using [KenLM](https://kheafield.com/code/kenlm/estimation/).
53
- 5-gram Language model was build on sentences from `Train` + (`Other` - `Dev` - `Test`) splits of `mozilla-foundation/common_voice_8_0 be` dataset.
54
 
55
  Source code is available [here](https://github.com/yks72p/stt_be).
 
45
  Fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on `mozilla-foundation/common_voice_8_0 be` dataset.
46
 
47
  `Train`, `Dev`, `Test` splits were used as they are present in the dataset. No additional data was used from `Validated` split,
48
+ only 1 voicing of each sentence was used - the way the data was split by [CommonVoice CorporaCreator](https://github.com/common-voice/CorporaCreator).
49
+ To build a better model **one can use additional voicings from `Validated` split** for sentences already present in `Train`, `Dev`, `Test` splits,
50
+ i.e. enlarge mentioned splits.
51
 
52
  Language model was built using [KenLM](https://kheafield.com/code/kenlm/estimation/).
53
+ 5-gram Language model was built on sentences from `Train + (Other - Dev - Test)` splits of `mozilla-foundation/common_voice_8_0 be` dataset.
54
 
55
  Source code is available [here](https://github.com/yks72p/stt_be).