Spaces:

line-corporation
/

promptttspp

Running

File size: 1,915 Bytes

82334b0

# data_prep

This directory contains the following data preparation scripts:

1. MFA data preparation: Code for extracting phone alignments by Montréal Forced Aligner (MFA)
2. Style prompt data preparation: Code for preparing synthetic annotations of style prompts.

## 0. Download LibriTTS_R

Before running any scripts, be sure to put the [LibriTTS-R](https://www.openslr.org/141/) dataset to `./LibriTTS_R`. You must have the following directory structure:

```
LibriTTS_R/
├── BOOKS.txt
├── CHAPTERS.txt
├── LICENSE.txt
├── NOTE.txt
├── README_librispeech.txt
├── README_libritts.txt
├── README_libritts_r.txt
├── SPEAKERS.txt
├── dev-clean
├── dev-other
├── reader_book.tsv
├── speakers.tsv
├── test-clean
├── test-other
├── train-clean-100
├── train-clean-360
└── train-other-500
```

## 1. MFA data preparation

### Setup for MFA

```
conda install -c conda-forge montreal-forced-aligner
```

```
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa
```

### Usage

Please check `runall_mfa.sh` for the usage.

Note that running MFA for all the utterances in LibriTTS-R takes a long time (likely a few days).


### Directory structure

After all the data preparation steps, the following directories will be created:

- `libritts_r_per_spk_cleaned`
  - `${spk}`
    - `textgrid`: text grid files
    - `wav24k`: 24kHz wav files

```
├── 100
│   ├── textgrid
│   └── wav24k
├── 1001
│   ├── textgrid
│   └── wav24k
├── 1006
│   ├── textgrid
│   └── wav24k
...
```


## 2. Style prompt data preparation

Code for estimating per-utterance style tags (e.g., low pitch, normal pitch and high pitch) from the data statistics.

### Usage

Please check `runall_style_prompt_tags.sh` for the usage.