--- language: "en" inference: false tags: - Vocoder - HiFIGAN - speech-synthesis - speechbrain license: "apache-2.0" datasets: - LJSpeech ---

# Vocoder with HiFIGAN Unit trained on LJSpeech This repository provides all the necessary tools for using a [HiFiGAN Unit](https://arxiv.org/abs/2104.00355) vocoder trained with [LJSpeech](https://keithito.com/LJ-Speech-Dataset/). The pre-trained model take as input discrete self-supervised representations and produces a waveform as output. Typically, this model is utilized on top of a speech-to-unit translation model that converts an input utterance from a source language into a sequence of discrete speech units in a target language. To generate the discrete self-supervised representations, we employ a K-means clustering model trained on the 6th layer of HuBERT, with `k=100`. ## Install SpeechBrain First of all, please install tranformers and SpeechBrain with the following command: ``` pip install speechbrain transformers ``` Please notice that we encourage you to read our tutorials and learn more about [SpeechBrain](https://speechbrain.github.io). ### Using the Vocoder ```python import torch from speechbrain.inference.vocoders import UnitHIFIGAN hifi_gan_unit = UnitHIFIGAN.from_hparams(source="speechbrain/tts-hifigan-unit-hubert-l6-k100-ljspeech", savedir="pretrained_models/tts-hifigan-unit-hubert-l6-k100-ljspeech") codes = torch.randint(0, 99, (100,)) waveform = hifi_gan_unit.decode_unit(codes) ``` ### Using the Vocoder with the S2UT ```python import torch import torchaudio from speechbrain.inference.ST import EncoderDecoderS2UT from speechbrain.inference.vocoders import UnitHIFIGAN # Intialize S2UT (Transformer) and Vocoder (HiFIGAN Unit) s2ut = EncoderDecoderS2UT.from_hparams(source="speechbrain/s2st-transformer-fr-en-hubert-l6-k100-cvss", savedir="pretrained_models/s2st-transformer-fr-en-hubert-l6-k100-cvss") hifi_gan_unit = UnitHIFIGAN.from_hparams(source="speechbrain/tts-hifigan-unit-hubert-l6-k100-ljspeech", savedir="pretrained_models/tts-hifigan-unit-hubert-l6-k100-ljspeech") # Running the S2UT model codes = s2ut.translate_file("speechbrain/s2st-transformer-fr-en-hubert-l6-k100-cvss/example-fr.wav") codes = torch.IntTensor(codes) # Running Vocoder (units-to-waveform) waveforms = hifi_gan_unit.decode_unit(codes) # Save the waverform torchaudio.save('example.wav',waveforms.squeeze(1), 16000) ``` ### Inference on GPU To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method. ### Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. #### Referencing SpeechBrain ``` @misc{SB2021, author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua }, title = {SpeechBrain}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}}, } ``` #### About SpeechBrain SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. Website: https://speechbrain.github.io/ GitHub: https://github.com/speechbrain/speechbrain