espnet
/

fastspeech2_conformer_with_hifigan

fastspeech2_conformer_with_hifigan

Inference Endpoints

Model card Files Files and versions Community

connor-henderson commited on Oct 5, 2023

Commit

7c7b76c

•

1 Parent(s): 7aacc08

Create README.md

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+license: apache-2.0
+language:
+- en
+library_name: transformers
+---
+# FastSpeech2ConformerWithHifiGan
+<!-- Provide a quick summary of what the model is/does. -->
+This model combines [FastSpeech2Conformer](https://huggingface.co/espnet/fastspeech2_conformer) and [FastSpeech2ConformerHifiGan](https://huggingface.co/espnet/fastspeech2_conformer_hifigan) into one model for a simpler and more convenient usage.
+FastSpeech2Conformer is a non-autoregressive text-to-speech (TTS) model that combines the strengths of FastSpeech2 and the conformer architecture to generate high-quality speech from text quickly and efficiently, and the HiFi-GAN vocoder is used to turn generated mel-spectrograms into speech waveforms.
+## 🤗 Transformers Usage
+You can run FastSpeech2Conformer locally with the 🤗 Transformers library.
+1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) and g2p-en:
+```
+pip install --upgrade pip
+pip install --upgrade transformers g2p-en
+```
+2. Run inference via the Transformers modelling code with the model and hifigan combined
+```python
+from transformers import FastSpeech2ConformerTokenizer, FastSpeech2ConformerWithHifiGan
+import soundfile as sf
+tokenizer = FastSpeech2ConformerTokenizer.from_pretrained("espnet/fastspeech2_conformer")
+inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
+input_ids = inputs["input_ids"]
+model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan")
+output_dict = model(input_ids, return_dict=True)
+waveform = output_dict["waveform"]
+sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
+```