projecte-aina
/

tts-ca-coqui-vits-multispeaker

@@ -21,7 +21,9 @@ datasets:
 # Aina Project's Catalan multi-speaker text-to-speech model
 ## Model description
-This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), [OpenSLR](http://openslr.org/69/) and [Common Voice](https://commonvoice.mozilla.org/ca). For the training, 101460 utterances consisting of 257 speakers were used, which corresponds to nearly 138 hours of speech. [Here](https://huggingface.co/spaces/projecte-aina/VITS_ca_multispeaker) you can find a demo of the model. A live inference of the demo can be found in [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker)
 ## Intended uses and limitations
@@ -67,7 +69,7 @@ wavs = synthesizer.tts(text, speaker_idx)
 ## Training
 ### Training Procedure
 ### Data preparation
-The data has been processed using the script [process_data.sh](https://huggingface.co/projecte-aina/tts-multispeaker-ca-aina/blob/main/data_processing/process_data.sh), which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information [here](https://huggingface.co/projecte-aina/tts-multispeaker-ca-aina/blob/main/data_processing/README.md).
 ### Hyperparameter
@@ -116,7 +118,7 @@ Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
-This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
 ## Disclaimer

 # Aina Project's Catalan multi-speaker text-to-speech model
 ## Model description
+This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), high quality open speech dataset of [Google](http://openslr.org/69/) and [Common Voice v8](https://commonvoice.mozilla.org/ca). For the training, 101460 utterances consisting of 257 speakers were used, which corresponds to nearly 138 hours of speech.
+A live inference demo can be found in our spaces, [here](https://huggingface.co/spaces/projecte-aina/tts-ca-coqui-vits-multispeaker).
 ## Intended uses and limitations
 ## Training
 ### Training Procedure
 ### Data preparation
+The data has been processed using the script [process_data.sh](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/process_data.sh), which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information [here](https://huggingface.co/projecte-aina/tts-ca-coqui-vits-multispeaker/blob/main/data_processing/README.md).
 ### Hyperparameter
 [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 ### Funding
+This work was funded by the [Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
 ## Disclaimer