Spaces:
Running
Running
Yurii Paniv
commited on
Commit
•
7fe4d31
1
Parent(s):
03f568d
Add instructions for dataset preparation
Browse files- scripts/README.md +13 -0
scripts/README.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# How to prepare dataset for training
|
2 |
+
|
3 |
+
1. Download Ukrainian dataset from [https://github.com/egorsmkv/speech-recognition-uk](https://github.com/egorsmkv/speech-recognition-uk).
|
4 |
+
2. Delete Common Voice folder in dataset
|
5 |
+
3. Download [import_ukrainian.py](scripts/import_ukrainian.py) and put into DeepSpeech/bin folder.
|
6 |
+
4. Run import script
|
7 |
+
5. Download Common Voice 6.1 Ukrainian dataset
|
8 |
+
6. Convert to DeepSpeech format
|
9 |
+
7. Merge train.csv from dataset and from DeepSpeech into one file
|
10 |
+
8. Put CV files into dataset files folder
|
11 |
+
9. Put dev.csv and test.csv into folder
|
12 |
+
|
13 |
+
You have a reproducible dataset!
|