File size: 1,680 Bytes
7c7b76c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
license: apache-2.0
language:
- en
library_name: transformers
---
# FastSpeech2ConformerWithHifiGan
<!-- Provide a quick summary of what the model is/does. -->
This model combines [FastSpeech2Conformer](https://huggingface.co/espnet/fastspeech2_conformer) and [FastSpeech2ConformerHifiGan](https://huggingface.co/espnet/fastspeech2_conformer_hifigan) into one model for a simpler and more convenient usage.
FastSpeech2Conformer is a non-autoregressive text-to-speech (TTS) model that combines the strengths of FastSpeech2 and the conformer architecture to generate high-quality speech from text quickly and efficiently, and the HiFi-GAN vocoder is used to turn generated mel-spectrograms into speech waveforms.
## 🤗 Transformers Usage
You can run FastSpeech2Conformer locally with the 🤗 Transformers library.
1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) and g2p-en:
```
pip install --upgrade pip
pip install --upgrade transformers g2p-en
```
2. Run inference via the Transformers modelling code with the model and hifigan combined
```python
from transformers import FastSpeech2ConformerTokenizer, FastSpeech2ConformerWithHifiGan
import soundfile as sf
tokenizer = FastSpeech2ConformerTokenizer.from_pretrained("espnet/fastspeech2_conformer")
inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
input_ids = inputs["input_ids"]
model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan")
output_dict = model(input_ids, return_dict=True)
waveform = output_dict["waveform"]
sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
```
|