JustinLin610's picture
first commit
ee21b96
|
raw
history blame
1.72 kB

[Back]

VCTK

VCTK is an open English speech corpus. We provide examples for building Transformer models on this dataset.

Data preparation

Download data, create splits and generate audio manifests with

python -m examples.speech_synthesis.preprocessing.get_vctk_audio_manifest \
  --output-data-root ${AUDIO_DATA_ROOT} \
  --output-manifest-root ${AUDIO_MANIFEST_ROOT}

Then, extract log-Mel spectrograms, generate feature manifest and create data configuration YAML with

python -m examples.speech_synthesis.preprocessing.get_feature_manifest \
  --audio-manifest-root ${AUDIO_MANIFEST_ROOT} \
  --output-root ${FEATURE_MANIFEST_ROOT} \
  --ipa-vocab --use-g2p

where we use phoneme inputs (--ipa-vocab --use-g2p) as example.

To denoise audio and trim leading/trailing silence using signal processing based VAD, run

for SPLIT in dev test train; do
    python -m examples.speech_synthesis.preprocessing.denoise_and_vad_audio \
      --audio-manifest ${AUDIO_MANIFEST_ROOT}/${SPLIT}.audio.tsv \
      --output-dir ${PROCESSED_DATA_ROOT} \
      --denoise --vad --vad-agg-level 3
done

Training

(Please refer to the LJSpeech example.)

Inference

(Please refer to the LJSpeech example.)

Automatic Evaluation

(Please refer to the LJSpeech example.)

Results

--arch Params Test MCD Model
tts_transformer 54M 3.4 Download

[Back]