connor-henderson commited on
Commit
7c7b76c
1 Parent(s): 7aacc08

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ ---
7
+
8
+ # FastSpeech2ConformerWithHifiGan
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model combines [FastSpeech2Conformer](https://huggingface.co/espnet/fastspeech2_conformer) and [FastSpeech2ConformerHifiGan](https://huggingface.co/espnet/fastspeech2_conformer_hifigan) into one model for a simpler and more convenient usage.
13
+
14
+ FastSpeech2Conformer is a non-autoregressive text-to-speech (TTS) model that combines the strengths of FastSpeech2 and the conformer architecture to generate high-quality speech from text quickly and efficiently, and the HiFi-GAN vocoder is used to turn generated mel-spectrograms into speech waveforms.
15
+
16
+ ## 🤗 Transformers Usage
17
+
18
+ You can run FastSpeech2Conformer locally with the 🤗 Transformers library.
19
+
20
+ 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) and g2p-en:
21
+
22
+ ```
23
+ pip install --upgrade pip
24
+ pip install --upgrade transformers g2p-en
25
+ ```
26
+
27
+ 2. Run inference via the Transformers modelling code with the model and hifigan combined
28
+
29
+ ```python
30
+
31
+ from transformers import FastSpeech2ConformerTokenizer, FastSpeech2ConformerWithHifiGan
32
+ import soundfile as sf
33
+
34
+ tokenizer = FastSpeech2ConformerTokenizer.from_pretrained("espnet/fastspeech2_conformer")
35
+ inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")
36
+ input_ids = inputs["input_ids"]
37
+
38
+ model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan")
39
+ output_dict = model(input_ids, return_dict=True)
40
+ waveform = output_dict["waveform"]
41
+
42
+ sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
43
+ ```