Spaces:

GreenCounsel
/

SpeechT5-sv

Sleeping

App Files Files Community

CEHB commited on Jul 19, 2023

Commit

e970b40

•

1 Parent(s): 3887b31

Update app.py

Browse files

Files changed (1) hide show

app.py +11 -12

app.py CHANGED Viewed

@@ -59,24 +59,24 @@ def predict(text, speaker):
     return (16000, speech)
-title = "SpeechT5: Speech Synthesis"
 description = """
-The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
-By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
-SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
-See also the <a href="https://huggingface.co/spaces/Matthijs/speecht5-asr-demo">speech recognition (ASR) demo</a>
-and the <a href="https://huggingface.co/spaces/Matthijs/speecht5-vc-demo">voice conversion demo</a>.
-Refer to <a href="https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ">this Colab notebook</a> to learn how to fine-tune the SpeechT5 TTS model on your own dataset or language.
-<b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
-HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.
-The <em>Surprise Me!</em> option creates a completely randomized speaker.
 """
 article = """
 <div style='margin:20px auto;'>
 <p>References: <a href="https://arxiv.org/abs/2110.07205">SpeechT5 paper</a> |
-<a href="https://github.com/microsoft/SpeechT5/">original GitHub</a> |
 <a href="https://huggingface.co/mechanicalsea/speecht5-tts">original weights</a></p>
 <pre>
 @article{Ao2021SpeechT5,
@@ -88,7 +88,6 @@ article = """
   year={2021}
 }
 </pre>
-<p>Speaker embeddings were generated from <a href="http://www.festvox.org/cmu_arctic/">CMU ARCTIC</a> using <a href="https://huggingface.co/mechanicalsea/speecht5-vc/blob/main/manifest/utils/prep_cmu_arctic_spkemb.py">this script</a>.</p>
 </div>
 """

     return (16000, speech)
+title = "SpeechT5 finetuned Swedish, TTS "
 description = """
+SpeechT5 text-to-speech model finetuned on the Swedish language from the
+Common Voice dataset. Inference runs on a basic CPU (2 vCPU, 16 GB ram) so
+please have patience if it takes some time. As a company founded by a female
+coder, our resources are extremely limited (female founders in tech only get approx.
+1 % of the venture capital and the women who receive funding seldom are the
+ones actually handling the tech). We are in a very biased sphere where
+female coders' companies seldom get the resources which would normally
+be necessary to do what they do. The app uses the SpeechT5 model
+finetuned for swedish by GreenCounsel, available here: [https://huggingface.co/GreenCounsel/speecht5_tts_common_voice_5_sv](https://huggingface.co/GreenCounsel/speecht5_tts_common_voice_5_sv).
 """
 article = """
 <div style='margin:20px auto;'>
 <p>References: <a href="https://arxiv.org/abs/2110.07205">SpeechT5 paper</a> |
+<a href="https://github.com/microsoft/SpeechT5/">original SpeechT5</a> |
 <a href="https://huggingface.co/mechanicalsea/speecht5-tts">original weights</a></p>
 <pre>
 @article{Ao2021SpeechT5,
   year={2021}
 }
 </pre>
 </div>
 """