taein-tts / README.md
icecream0910's picture
Update README.md
02ad397 verified
|
raw
history blame
4.65 kB
metadata
license: mit
language:
  - ko
pipeline_tag: text-to-speech

Taein-TTS

License

Description

Taein-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in my own voice. This repository includes pre-trained models that have been trained using my voice.

Table of Contents

Installation

This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.

  1. Clone the huggingface repository: https://huggingface.co/icecream0910/taein-tts

  2. Modify the run-server.bat batch file in the /server directory to match your actual file paths.

    For example, if your server folder is at C:\myown-tts\server, update the file as follows:

    @echo off
    setlocal
    cd /D "%~dp0"
    set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc
    set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar
    set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json
    set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar
    set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json
    server.exe
    endlocal
    
  3. Update the glowtts-v2/config.json and hifigan-v2/config.json files in the /server/models/ directory with your actual file paths.

    Ensure you double the backslash (\\) in the file paths, as shown below:

    • For glowtts-v2/config.json:
    "stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
    
    • For hifigan-v2/config.json:
    "stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
    

Usage

To start the TTS server, execute run-server.bat. Once the server is running, you will see the message INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.

API

  • Text preprocessing: /tts-server/api/process-text

    Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.

  • Text Inference: /tts-server/api/infer-glowtts

    Synthesizes text to speech. Send the text to be synthesized in the text parameter of the URL.

    Example:

    http://localhost:5000/tts-server/api/infer-glowtts?text=hello
    

Text Inference Demo Page

Visit http://localhost:5000/ for a demo.

Contributing

  1. Fork the repository (https://github.com/icecream0910/myown-tts/fork).
  2. Create a new branch: git checkout -b feature/<featureName>.
  3. Commit your changes: git commit -am 'Add <featureName>'.
  4. Push to the branch: git push origin feature/<featureName>.
  5. Submit a pull request.

License

This project is licensed under the MIT License.

References

This implementation draws inspiration from the following repositories:

The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.