Yurii Paniv commited on
Commit
7fe4d31
1 Parent(s): 03f568d

Add instructions for dataset preparation

Browse files
Files changed (1) hide show
  1. scripts/README.md +13 -0
scripts/README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How to prepare dataset for training
2
+
3
+ 1. Download Ukrainian dataset from [https://github.com/egorsmkv/speech-recognition-uk](https://github.com/egorsmkv/speech-recognition-uk).
4
+ 2. Delete Common Voice folder in dataset
5
+ 3. Download [import_ukrainian.py](scripts/import_ukrainian.py) and put into DeepSpeech/bin folder.
6
+ 4. Run import script
7
+ 5. Download Common Voice 6.1 Ukrainian dataset
8
+ 6. Convert to DeepSpeech format
9
+ 7. Merge train.csv from dataset and from DeepSpeech into one file
10
+ 8. Put CV files into dataset files folder
11
+ 9. Put dev.csv and test.csv into folder
12
+
13
+ You have a reproducible dataset!