promptttspp / data_prep /README.md
MasayaKawamura's picture
Initial commit
82334b0

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

data_prep

This directory contains the following data preparation scripts:

  1. MFA data preparation: Code for extracting phone alignments by MontrΓ©al Forced Aligner (MFA)
  2. Style prompt data preparation: Code for preparing synthetic annotations of style prompts.

0. Download LibriTTS_R

Before running any scripts, be sure to put the LibriTTS-R dataset to ./LibriTTS_R. You must have the following directory structure:

LibriTTS_R/
β”œβ”€β”€ BOOKS.txt
β”œβ”€β”€ CHAPTERS.txt
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ NOTE.txt
β”œβ”€β”€ README_librispeech.txt
β”œβ”€β”€ README_libritts.txt
β”œβ”€β”€ README_libritts_r.txt
β”œβ”€β”€ SPEAKERS.txt
β”œβ”€β”€ dev-clean
β”œβ”€β”€ dev-other
β”œβ”€β”€ reader_book.tsv
β”œβ”€β”€ speakers.tsv
β”œβ”€β”€ test-clean
β”œβ”€β”€ test-other
β”œβ”€β”€ train-clean-100
β”œβ”€β”€ train-clean-360
└── train-other-500

1. MFA data preparation

Setup for MFA

conda install -c conda-forge montreal-forced-aligner
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa

Usage

Please check runall_mfa.sh for the usage.

Note that running MFA for all the utterances in LibriTTS-R takes a long time (likely a few days).

Directory structure

After all the data preparation steps, the following directories will be created:

  • libritts_r_per_spk_cleaned
    • ${spk}
      • textgrid: text grid files
      • wav24k: 24kHz wav files
β”œβ”€β”€ 100
β”‚   β”œβ”€β”€ textgrid
β”‚   └── wav24k
β”œβ”€β”€ 1001
β”‚   β”œβ”€β”€ textgrid
β”‚   └── wav24k
β”œβ”€β”€ 1006
β”‚   β”œβ”€β”€ textgrid
β”‚   └── wav24k
...

2. Style prompt data preparation

Code for estimating per-utterance style tags (e.g., low pitch, normal pitch and high pitch) from the data statistics.

Usage

Please check runall_style_prompt_tags.sh for the usage.