taein-tts / README.md

icecream0910

Update README.md

02ad397 verified 7 months ago

preview code

raw

history blame

No virus

4.65 kB

	---
	license: mit
	language:
	- ko
	pipeline_tag: text-to-speech
	---
	# Taein-TTS
	[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

	## Description

	Taein-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in my own voice. This repository includes pre-trained models that have been trained using my voice.

	## Table of Contents

	- [Installation](#installation)
	- [Usage](#usage)
	- [Contributing](#contributing)
	- [License](#license)

	## Installation

	This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.

	1. Clone the huggingface repository:
	[https://huggingface.co/icecream0910/taein-tts](https://huggingface.co/icecream0910/taein-tts)

	2. Modify the `run-server.bat` batch file in the `/server` directory to match your actual file paths.

	For example, if your server folder is at `C:\myown-tts\server`, update the file as follows:

	```bat
	@echo off
	setlocal
	cd /D "%~dp0"
	set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc
	set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar
	set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json
	set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar
	set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json
	server.exe
	endlocal
	```

	3. Update the `glowtts-v2/config.json` and `hifigan-v2/config.json` files in the `/server/models/` directory with your actual file paths.

	Ensure you double the backslash (`\\`) in the file paths, as shown below:

	- For `glowtts-v2/config.json`:
	```json
	"stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
	```

	- For `hifigan-v2/config.json`:
	```json
	"stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
	```

	## Usage

	To start the TTS server, execute `run-server.bat`. Once the server is running, you will see the message `INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)` in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.

	### API

	- Text preprocessing: `/tts-server/api/process-text`

	Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.

	- Text Inference: `/tts-server/api/infer-glowtts`

	Synthesizes text to speech. Send the text to be synthesized in the `text` parameter of the URL.

	Example:
	```
	http://localhost:5000/tts-server/api/infer-glowtts?text=hello
	```

	### Text Inference Demo Page

	Visit [http://localhost:5000/](http://localhost:5000/) for a demo.

	## Contributing

	1. Fork the repository (https://github.com/icecream0910/myown-tts/fork).
	2. Create a new branch: `git checkout -b feature/<featureName>`.
	3. Commit your changes: `git commit -am 'Add <featureName>'`.
	4. Push to the branch: `git push origin feature/<featureName>`.
	5. Submit a pull request.

	## License

	This project is licensed under the [MIT License](LICENSE).

	## References

	This implementation draws inspiration from the following repositories:

	- [SCE-TTS](https://github.com/sce-tts)
	- [g2pK](https://github.com/Kyubyong/g2pK)
	- [mimic-recording-studio](https://github.com/MycroftAI/mimic-recording-studio)
	- [coqui TTS](https://github.com/coqui-ai/TTS)

	The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.

	- [Korean Corpus for Voice Recording](https://github.com/sce-tts/mimic-recording-studio/blob/master/backend/prompts/korean_corpus.csv)
	- [SleepingCE Speech Dataset](https://drive.google.com/file/d/1UpoBaZRTJXkTdsoemLBWV48QClm6hpTX/view?usp=sharing)
	- [Pre-trained Models for SleepingCE Speech Dataset (Glow-TTS)](https://drive.google.com/file/d/1DMKLdfZ_gzc_z0qDod6_G8fEXj0zCHvC/view?usp=sharing)
	- [Pre-trained Models for SleepingCE Speech Dataset (HiFi-GAN)](https://drive.google.com/file/d/1vRxp1RH-U7gSzWgyxnKY4h_7pB3tjPmU/view?usp=sharing)
	- These models were fine-tuned from the model provided by [coqui-ai/TTS](https://github.com/coqui-ai/TTS), trained on the [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443), available [here](https://github.com/coqui-ai/TTS/releases/download/v0.0.12/vocoder_model--en--vctk--hifigan_v2.zip).