--- license: apache-2.0 language: - de library_name: nemo tags: - tts - pytorch - FastPitch - speech pipeline_tag: text-to-speech --- This FastPitch[1] model was trained on the HUI-Audio-Corpus-German[2] clean dataset using the Nemo Toolkit[3]. We selected 5 speakers who have the 5-largest amount of data and balanced training data across speakers (around 20 hours per speaker). This a retrained model of: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_de_fastpitch_multispeaker_5 # How to Use: Use with Nemo Toolkit version 1.14.0 ```python # Load spectrogram generator from nemo.collections.tts.models import FastPitchModel spec_generator = FastPitchModel.restore_from("path/to/model.nemo") # Load Vocoder from nemo.collections.tts.models import HifiGanModel model = HifiGanModel.from_pretrained(model_name="tts_de_hui_hifigan_ft_fastpitch_multispeaker_5") # Generate audio import torchaudio parsed = spec_generator.parse("") speaker_id = 0 spectrogram = spec_generator.generate_spectrogram(tokens=parsed, speaker=speaker_id) audio = model.convert_spectrogram_to_audio(spec=spectrogram) # Save the audio to disk in a file called speech.wav torchaudio.save('german_speech.wav', audio.cpu(), 44100) ``` [1] FastPitch: Parallel Text-to-speech with Pitch Prediction: https://arxiv.org/abs/2006.06873 [2] HUI-Audio-Corpus-German Dataset: https://opendata.iisys.de/datasets.html [3] NVIDIA NeMo Toolkit: https://github.com/NVIDIA/NeMo