license: mit
language:
- ko
pipeline_tag: text-to-speech
Taein-TTS
Description
Taein-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in my own voice. This repository includes pre-trained models that have been trained using my voice.
Table of Contents
Installation
This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.
Clone the huggingface repository: https://huggingface.co/icecream0910/taein-tts
Modify the
run-server.bat
batch file in the/server
directory to match your actual file paths.For example, if your server folder is at
C:\myown-tts\server
, update the file as follows:@echo off setlocal cd /D "%~dp0" set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json server.exe endlocal
Update the
glowtts-v2/config.json
andhifigan-v2/config.json
files in the/server/models/
directory with your actual file paths.Ensure you double the backslash (
\\
) in the file paths, as shown below:- For
glowtts-v2/config.json
:
"stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
- For
hifigan-v2/config.json
:
"stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
- For
Usage
To start the TTS server, execute run-server.bat
. Once the server is running, you will see the message INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.
API
Text preprocessing:
/tts-server/api/process-text
Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.
Text Inference:
/tts-server/api/infer-glowtts
Synthesizes text to speech. Send the text to be synthesized in the
text
parameter of the URL.Example:
http://localhost:5000/tts-server/api/infer-glowtts?text=hello
Text Inference Demo Page
Visit http://localhost:5000/ for a demo.
Contributing
- Fork the repository (https://github.com/icecream0910/myown-tts/fork).
- Create a new branch:
git checkout -b feature/<featureName>
. - Commit your changes:
git commit -am 'Add <featureName>'
. - Push to the branch:
git push origin feature/<featureName>
. - Submit a pull request.
License
This project is licensed under the MIT License.
References
This implementation draws inspiration from the following repositories:
The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.
- Korean Corpus for Voice Recording
- SleepingCE Speech Dataset
- Pre-trained Models for SleepingCE Speech Dataset (Glow-TTS)
- Pre-trained Models for SleepingCE Speech Dataset (HiFi-GAN)
- These models were fine-tuned from the model provided by coqui-ai/TTS, trained on the VCTK dataset, available here.