File size: 1,915 Bytes
82334b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# data_prep

This directory contains the following data preparation scripts:

1. MFA data preparation: Code for extracting phone alignments by MontrΓ©al Forced Aligner (MFA)
2. Style prompt data preparation: Code for preparing synthetic annotations of style prompts.

## 0. Download LibriTTS_R

Before running any scripts, be sure to put the [LibriTTS-R](https://www.openslr.org/141/) dataset to `./LibriTTS_R`. You must have the following directory structure:

```
LibriTTS_R/
β”œβ”€β”€ BOOKS.txt
β”œβ”€β”€ CHAPTERS.txt
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ NOTE.txt
β”œβ”€β”€ README_librispeech.txt
β”œβ”€β”€ README_libritts.txt
β”œβ”€β”€ README_libritts_r.txt
β”œβ”€β”€ SPEAKERS.txt
β”œβ”€β”€ dev-clean
β”œβ”€β”€ dev-other
β”œβ”€β”€ reader_book.tsv
β”œβ”€β”€ speakers.tsv
β”œβ”€β”€ test-clean
β”œβ”€β”€ test-other
β”œβ”€β”€ train-clean-100
β”œβ”€β”€ train-clean-360
└── train-other-500
```

## 1. MFA data preparation

### Setup for MFA

```
conda install -c conda-forge montreal-forced-aligner
```

```
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa
```

### Usage

Please check `runall_mfa.sh` for the usage.

Note that running MFA for all the utterances in LibriTTS-R takes a long time (likely a few days).


### Directory structure

After all the data preparation steps, the following directories will be created:

- `libritts_r_per_spk_cleaned`
  - `${spk}`
    - `textgrid`: text grid files
    - `wav24k`: 24kHz wav files

```
β”œβ”€β”€ 100
β”‚Β Β  β”œβ”€β”€ textgrid
β”‚Β Β  └── wav24k
β”œβ”€β”€ 1001
β”‚Β Β  β”œβ”€β”€ textgrid
β”‚Β Β  └── wav24k
β”œβ”€β”€ 1006
β”‚Β Β  β”œβ”€β”€ textgrid
β”‚Β Β  └── wav24k
...
```


## 2. Style prompt data preparation

Code for estimating per-utterance style tags (e.g., low pitch, normal pitch and high pitch) from the data statistics.

### Usage

Please check `runall_style_prompt_tags.sh` for the usage.