taein-tts / README.md
icecream0910's picture
Update README.md
02ad397 verified
---
license: mit
language:
- ko
pipeline_tag: text-to-speech
---
# Taein-TTS
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
## Description
Taein-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in my own voice. This repository includes pre-trained models that have been trained using my voice.
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)
## Installation
This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.
1. Clone the huggingface repository:
[https://huggingface.co/icecream0910/taein-tts](https://huggingface.co/icecream0910/taein-tts)
2. Modify the `run-server.bat` batch file in the `/server` directory to match your actual file paths.
For example, if your server folder is at `C:\myown-tts\server`, update the file as follows:
```bat
@echo off
setlocal
cd /D "%~dp0"
set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc
set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar
set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json
set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar
set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json
server.exe
endlocal
```
3. Update the `glowtts-v2/config.json` and `hifigan-v2/config.json` files in the `/server/models/` directory with your actual file paths.
Ensure you double the backslash (`\\`) in the file paths, as shown below:
- For `glowtts-v2/config.json`:
```json
"stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
```
- For `hifigan-v2/config.json`:
```json
"stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
```
## Usage
To start the TTS server, execute `run-server.bat`. Once the server is running, you will see the message `INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)` in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.
### API
- Text preprocessing: `/tts-server/api/process-text`
Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.
- Text Inference: `/tts-server/api/infer-glowtts`
Synthesizes text to speech. Send the text to be synthesized in the `text` parameter of the URL.
Example:
```
http://localhost:5000/tts-server/api/infer-glowtts?text=hello
```
### Text Inference Demo Page
Visit [http://localhost:5000/](http://localhost:5000/) for a demo.
## Contributing
1. Fork the repository (https://github.com/icecream0910/myown-tts/fork).
2. Create a new branch: `git checkout -b feature/<featureName>`.
3. Commit your changes: `git commit -am 'Add <featureName>'`.
4. Push to the branch: `git push origin feature/<featureName>`.
5. Submit a pull request.
## License
This project is licensed under the [MIT License](LICENSE).
## References
This implementation draws inspiration from the following repositories:
- [SCE-TTS](https://github.com/sce-tts)
- [g2pK](https://github.com/Kyubyong/g2pK)
- [mimic-recording-studio](https://github.com/MycroftAI/mimic-recording-studio)
- [coqui TTS](https://github.com/coqui-ai/TTS)
The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.
- [Korean Corpus for Voice Recording](https://github.com/sce-tts/mimic-recording-studio/blob/master/backend/prompts/korean_corpus.csv)
- [SleepingCE Speech Dataset](https://drive.google.com/file/d/1UpoBaZRTJXkTdsoemLBWV48QClm6hpTX/view?usp=sharing)
- [Pre-trained Models for SleepingCE Speech Dataset (Glow-TTS)](https://drive.google.com/file/d/1DMKLdfZ_gzc_z0qDod6_G8fEXj0zCHvC/view?usp=sharing)
- [Pre-trained Models for SleepingCE Speech Dataset (HiFi-GAN)](https://drive.google.com/file/d/1vRxp1RH-U7gSzWgyxnKY4h_7pB3tjPmU/view?usp=sharing)
- These models were fine-tuned from the model provided by [coqui-ai/TTS](https://github.com/coqui-ai/TTS), trained on the [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443), available [here](https://github.com/coqui-ai/TTS/releases/download/v0.0.12/vocoder_model--en--vctk--hifigan_v2.zip).