vishred18's picture
Upload 364 files
d5ee97c
|
raw
history blame
1.99 kB

Fast speech 2 multi-speaker english lang based

Prepare

Everything is done from main repo folder so TensorflowTTS/

  1. Optional* Download and prepare libritts (helper to prepare libri in examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
  • Dataset structure after finish this step:
    |- TensorFlowTTS/
    |   |- LibriTTS/
    |   |-  |- train-clean-100/
    |   |-  |- SPEAKERS.txt
    |   |-  |- ...
    |   |- libritts/
    |   |-  |- 200/
    |   |-  |-  |- 200_124139_000001_000000.txt
    |   |-  |-  |- 200_124139_000001_000000.wav
    |   |-  |-  |- ...
    |   |-  |- 250/
    |   |-  |- ...
    |   |- tensorflow_tts/
    |       |- models/
    |       |- ...
    
  1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2)
  2. Optional* build docker
  • bash examples/fastspeech2_libritts/scripts/build.sh
    
  1. Optional* run docker
  • bash examples/fastspeech2_libritts/scripts/interactive.sh
    
  1. Preprocessing:
  • tensorflow-tts-preprocess --rootdir ./libritts \
      --outdir ./dump_libritts \
      --config preprocess/libritts_preprocess.yaml \
      --dataset libritts
    
  1. Normalization:
  • tensorflow-tts-normalize --rootdir ./dump_libritts \
      --outdir ./dump_libritts \
      --config preprocess/libritts_preprocess.yaml \
      --dataset libritts
    
  1. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
  2. Change train_libri.sh to match your dataset and run:
  • bash examples/fastspeech2_libritts/scripts/train_libri.sh
    
  1. Optional* If u have problems with tensor sizes mismatch check step 5 in examples/mfa_extraction directory

Comments

This version is using popular train.txt '|' split used in other repos. Training files should looks like this =>

Wav Path | Text | Speaker Name

Wav Path2 | Text | Speaker Name