speechbrain
/

tts-mstacotron2-libritts

multi-speaker-tts

Model card Files Files and versions Community

speechbrainteam commited on Oct 17, 2023

Commit

61d287c

•

1 Parent(s): aad77c0

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ metrics:
 # Text-to-Speech (TTS) with Zero-Shot Multi-Speaker Tacotron2 trained on LibriTTS
-### Note: This is a work in progress
 This repository provides all the necessary tools for Zero-Shot Multi-Speaker Text-to-Speech (TTS) with SpeechBrain using a variation of [Tacotron2](https://arxiv.org/abs/1712.05884), extended to incorporate speaker identity information when generating speech. It is pretrained on [LibriTTS](https://www.openslr.org/60/).
@@ -36,6 +36,9 @@ Please notice that we encourage you to read our tutorials and learn more about
 The following is an example of converting text-to-speech with the speaker voice characteristics extracted from reference speech.
 ```
 import torchaudio
 from speechbrain.pretrained import MSTacotron2

 # Text-to-Speech (TTS) with Zero-Shot Multi-Speaker Tacotron2 trained on LibriTTS
+### Note: This project is currently a work in progress. While the model is operational, we are now focusing on enhancing the quality of the generated voice
 This repository provides all the necessary tools for Zero-Shot Multi-Speaker Text-to-Speech (TTS) with SpeechBrain using a variation of [Tacotron2](https://arxiv.org/abs/1712.05884), extended to incorporate speaker identity information when generating speech. It is pretrained on [LibriTTS](https://www.openslr.org/60/).
 The following is an example of converting text-to-speech with the speaker voice characteristics extracted from reference speech.
+**Note:**
+- The model generates speech at a rate of 22050 Hz, but it's important to note that the input signal, crucial for capturing speaker identities, must be sampled at 16kHz.
 ```
 import torchaudio
 from speechbrain.pretrained import MSTacotron2