connor-henderson
commited on
Commit
•
7c7b76c
1
Parent(s):
7aacc08
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: transformers
|
6 |
+
---
|
7 |
+
|
8 |
+
# FastSpeech2ConformerWithHifiGan
|
9 |
+
|
10 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
+
|
12 |
+
This model combines [FastSpeech2Conformer](https://huggingface.co/espnet/fastspeech2_conformer) and [FastSpeech2ConformerHifiGan](https://huggingface.co/espnet/fastspeech2_conformer_hifigan) into one model for a simpler and more convenient usage.
|
13 |
+
|
14 |
+
FastSpeech2Conformer is a non-autoregressive text-to-speech (TTS) model that combines the strengths of FastSpeech2 and the conformer architecture to generate high-quality speech from text quickly and efficiently, and the HiFi-GAN vocoder is used to turn generated mel-spectrograms into speech waveforms.
|
15 |
+
|
16 |
+
## 🤗 Transformers Usage
|
17 |
+
|
18 |
+
You can run FastSpeech2Conformer locally with the 🤗 Transformers library.
|
19 |
+
|
20 |
+
1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) and g2p-en:
|
21 |
+
|
22 |
+
```
|
23 |
+
pip install --upgrade pip
|
24 |
+
pip install --upgrade transformers g2p-en
|
25 |
+
```
|
26 |
+
|
27 |
+
2. Run inference via the Transformers modelling code with the model and hifigan combined
|
28 |
+
|
29 |
+
```python
|
30 |
+
|
31 |
+
from transformers import FastSpeech2ConformerTokenizer, FastSpeech2ConformerWithHifiGan
|
32 |
+
import soundfile as sf
|
33 |
+
|
34 |
+
tokenizer = FastSpeech2ConformerTokenizer.from_pretrained("espnet/fastspeech2_conformer")
|
35 |
+
inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
|
36 |
+
input_ids = inputs["input_ids"]
|
37 |
+
|
38 |
+
model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan")
|
39 |
+
output_dict = model(input_ids, return_dict=True)
|
40 |
+
waveform = output_dict["waveform"]
|
41 |
+
|
42 |
+
sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
|
43 |
+
```
|