Spaces:
Running
Running
File size: 1,915 Bytes
82334b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# data_prep
This directory contains the following data preparation scripts:
1. MFA data preparation: Code for extracting phone alignments by MontrΓ©al Forced Aligner (MFA)
2. Style prompt data preparation: Code for preparing synthetic annotations of style prompts.
## 0. Download LibriTTS_R
Before running any scripts, be sure to put the [LibriTTS-R](https://www.openslr.org/141/) dataset to `./LibriTTS_R`. You must have the following directory structure:
```
LibriTTS_R/
βββ BOOKS.txt
βββ CHAPTERS.txt
βββ LICENSE.txt
βββ NOTE.txt
βββ README_librispeech.txt
βββ README_libritts.txt
βββ README_libritts_r.txt
βββ SPEAKERS.txt
βββ dev-clean
βββ dev-other
βββ reader_book.tsv
βββ speakers.tsv
βββ test-clean
βββ test-other
βββ train-clean-100
βββ train-clean-360
βββ train-other-500
```
## 1. MFA data preparation
### Setup for MFA
```
conda install -c conda-forge montreal-forced-aligner
```
```
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa
```
### Usage
Please check `runall_mfa.sh` for the usage.
Note that running MFA for all the utterances in LibriTTS-R takes a long time (likely a few days).
### Directory structure
After all the data preparation steps, the following directories will be created:
- `libritts_r_per_spk_cleaned`
- `${spk}`
- `textgrid`: text grid files
- `wav24k`: 24kHz wav files
```
βββ 100
βΒ Β βββ textgrid
βΒ Β βββ wav24k
βββ 1001
βΒ Β βββ textgrid
βΒ Β βββ wav24k
βββ 1006
βΒ Β βββ textgrid
βΒ Β βββ wav24k
...
```
## 2. Style prompt data preparation
Code for estimating per-utterance style tags (e.g., low pitch, normal pitch and high pitch) from the data statistics.
### Usage
Please check `runall_style_prompt_tags.sh` for the usage.
|