metadata

license: mit
language:
  - ko
pipeline_tag: text-to-speech

Taein-TTS

Description

Taein-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in my own voice. This repository includes pre-trained models that have been trained using my voice.

Installation
Usage
Contributing
License

Installation

This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.

Clone the huggingface repository: https://huggingface.co/icecream0910/taein-tts

Modify the run-server.bat batch file in the /server directory to match your actual file paths.

For example, if your server folder is at C:\myown-tts\server, update the file as follows:

@echo off
setlocal
cd /D "%~dp0"
set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc
set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar
set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json
set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar
set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json
server.exe
endlocal

Update the glowtts-v2/config.json and hifigan-v2/config.json files in the /server/models/ directory with your actual file paths.

Ensure you double the backslash (\\) in the file paths, as shown below:
- For glowtts-v2/config.json:
```
"stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
```
- For hifigan-v2/config.json:
```
"stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
```

Usage

To start the TTS server, execute run-server.bat. Once the server is running, you will see the message INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.

API

Text preprocessing: /tts-server/api/process-text

Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.
Text Inference: /tts-server/api/infer-glowtts

Synthesizes text to speech. Send the text to be synthesized in the text parameter of the URL.

Example:
```
http://localhost:5000/tts-server/api/infer-glowtts?text=hello
```

Text Inference Demo Page

Visit http://localhost:5000/ for a demo.

Contributing

Fork the repository (https://github.com/icecream0910/myown-tts/fork).
Create a new branch: git checkout -b feature/<featureName>.
Commit your changes: git commit -am 'Add <featureName>'.
Push to the branch: git push origin feature/<featureName>.
Submit a pull request.

License

This project is licensed under the MIT License.

References

This implementation draws inspiration from the following repositories:

The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.

Korean Corpus for Voice Recording
SleepingCE Speech Dataset
Pre-trained Models for SleepingCE Speech Dataset (Glow-TTS)
Pre-trained Models for SleepingCE Speech Dataset (HiFi-GAN)
- These models were fine-tuned from the model provided by coqui-ai/TTS, trained on the VCTK dataset, available here.

icecream0910
/

taein-tts