icecream0910 commited on
Commit
f5ced3a
β€’
1 Parent(s): 96e0a89

Upload 4 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ train-hifigan-v2.ipynb filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MyOwn-TTS
2
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
3
+
4
+ ## Description
5
+
6
+ MyOwn-TTS is a project aimed at creating a text-to-speech (TTS) system that reads sentences in your own voice. This repository includes pre-trained models that have been trained using my voice.
7
+
8
+ ## Table of Contents
9
+
10
+ - [Installation](#installation)
11
+ - [Usage](#usage)
12
+ - [Contributing](#contributing)
13
+ - [License](#license)
14
+
15
+ ## Installation
16
+
17
+ This README focuses on guiding you through the process of synthesizing speech using pre-trained models, rather than detailing the model training process.
18
+
19
+ 1. Clone the repository:
20
+
21
+ ```bash
22
+ git clone https://github.com/IceCream0910/myown-tts.git
23
+ ```
24
+
25
+ 2. Modify the `run-server.bat` batch file in the `/server` directory to match your actual file paths.
26
+
27
+ For example, if your server folder is at `C:\myown-tts\server`, update the file as follows:
28
+
29
+ ```bat
30
+ @echo off
31
+ setlocal
32
+ cd /D "%~dp0"
33
+ set MECAB_KO_DIC_PATH=.\mecab\mecab-ko-dic -r .\mecab\mecabrc
34
+ set TTS_MODEL_FILE=C:\myown-tts\server\models\glowtts-v2\best_model.pth.tar
35
+ set TTS_MODEL_CONFIG=C:\myown-tts\server\models\glowtts-v2\config.json
36
+ set VOCODER_MODEL_FILE=C:\myown-tts\server\models\hifigan-v2\best_model.pth.tar
37
+ set VOCODER_MODEL_CONFIG=C:\myown-tts\server\models\hifigan-v2\config.json
38
+ server.exe
39
+ endlocal
40
+ ```
41
+
42
+ 3. Update the `glowtts-v2/config.json` and `hifigan-v2/config.json` files in the `/server/models/` directory with your actual file paths.
43
+
44
+ Ensure you double the backslash (`\\`) in the file paths, as shown below:
45
+
46
+ - For `glowtts-v2/config.json`:
47
+ ```json
48
+ "stats_path": "C:\\mydata\\tts-server\\models\\glowtts-v2\\scale_stats.npy"
49
+ ```
50
+
51
+ - For `hifigan-v2/config.json`:
52
+ ```json
53
+ "stats_path": "C:\\mydata\\tts-server\\models\\hifigan-v2\\scale_stats.npy"
54
+ ```
55
+
56
+ ## Usage
57
+
58
+ To start the TTS server, execute `run-server.bat`. Once the server is running, you will see the message `INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)` in the command prompt, indicating that the speech synthesis feature is available through the TTS server. To stop the server, press CTRL+C in the command prompt.
59
+
60
+ ### API
61
+
62
+ - Text preprocessing: `/tts-server/api/process-text`
63
+
64
+ Splits sentences and removes special characters to automatically stitch together and playback multi-line sentences as you type.
65
+
66
+ - Text Inference: `/tts-server/api/infer-glowtts`
67
+
68
+ Synthesizes text to speech. Send the text to be synthesized in the `text` parameter of the URL.
69
+
70
+ Example:
71
+ ```
72
+ http://localhost:5000/tts-server/api/infer-glowtts?text=hello
73
+ ```
74
+
75
+ ### Text Inference Demo Page
76
+
77
+ Visit [http://localhost:5000/](http://localhost:5000/) for a demo.
78
+
79
+ ## Contributing
80
+
81
+ 1. Fork the repository (https://github.com/icecream0910/myown-tts/fork).
82
+ 2. Create a new branch: `git checkout -b feature/<featureName>`.
83
+ 3. Commit your changes: `git commit -am 'Add <featureName>'`.
84
+ 4. Push to the branch: `git push origin feature/<featureName>`.
85
+ 5. Submit a pull request.
86
+
87
+ ## License
88
+
89
+ This project is licensed under the [MIT License](LICENSE).
90
+
91
+ ## References
92
+
93
+ This implementation draws inspiration from the following repositories:
94
+
95
+ - [SCE-TTS](https://github.com/sce-tts)
96
+ - [g2pK](https://github.com/Kyubyong/g2pK)
97
+ - [mimic-recording-studio](https://github.com/MycroftAI/mimic-recording-studio)
98
+ - [coqui TTS](https://github.com/coqui-ai/TTS)
99
+
100
+ The datasets below are distributed under the CC-BY 2.0 license, with the original text data provided by the Korea Information Society Development Institute's AI Hub, including Korean dialogue text data and Korean-English translation (parallel) corpus text data.
101
+
102
+ - [Korean Corpus for Voice Recording](https://github.com/sce-tts/mimic-recording-studio/blob/master/backend/prompts/korean_corpus.csv)
103
+ - [SleepingCE Speech Dataset](https://drive.google.com/file/d/1UpoBaZRTJXkTdsoemLBWV48QClm6hpTX/view?usp=sharing)
104
+ - [Pre-trained Models for SleepingCE Speech Dataset (Glow-TTS)](https://drive.google.com/file/d/1DMKLdfZ_gzc_z0qDod6_G8fEXj0zCHvC/view?usp=sharing)
105
+ - [Pre-trained Models for SleepingCE Speech Dataset (HiFi-GAN)](https://drive.google.com/file/d/1vRxp1RH-U7gSzWgyxnKY4h_7pB3tjPmU/view?usp=sharing)
106
+ - These models were fine-tuned from the model provided by [coqui-ai/TTS](https://github.com/coqui-ai/TTS), trained on the [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443), available [here](https://github.com/coqui-ai/TTS/releases/download/v0.0.12/vocoder_model--en--vctk--hifigan_v2.zip).
infer-v2.ipynb ADDED
@@ -0,0 +1 @@
 
 
1
+ {"cells":[{"cell_type":"markdown","metadata":{"id":"AvxXfBluyZvb"},"source":["## 1. ꡬ글 λ“œλΌμ΄λΈŒ 마운트\n","\n","μŒμ„±ν•©μ„±μ„ μœ„ν•΄ ν•™μŠ΅ν•œ λͺ¨λΈμ΄ μžˆλŠ” ꡬ글 λ“œλΌμ΄λΈŒλ₯Ό λ§ˆμš΄νŠΈν•©λ‹ˆλ‹€. \n","λ§ˆμš΄νŠΈν•  ꡬ글 λ“œλΌμ΄λΈŒ 내에 λ‹€μŒ νŒŒμΌλ“€μ΄ μ‘΄μž¬ν•˜λŠ”μ§€ κΌ­ ν™•μΈν•΄μ£Όμ„Έμš”.\n","\n","- `/Colab Notebooks/data/glowtts-v2/model_file.pth.tar`\n","- `/Colab Notebooks/data/glowtts-v2/config.json`\n","- `/Colab Notebooks/data/hifigan-v2/model_file.pth.tar`\n","- `/Colab Notebooks/data/hifigan-v2/config.json`\n","\n","\n","(μ‘΄μž¬ν•˜μ§€ μ•ŠλŠ”λ‹€λ©΄, [glowtts-v2.zip](https://drive.google.com/file/d/1DMKLdfZ_gzc_z0qDod6_G8fEXj0zCHvC/view?usp=sharing), [hifigan-v2.zip](https://drive.google.com/file/d/1vRxp1RH-U7gSzWgyxnKY4h_7pB3tjPmU/view?usp=sharing)을 λ‚΄λ €λ°›μ•„ μ€€λΉ„ν•΄μ£Όμ„Έμš”.)\n","\n","λ§Œμ•½ μ•„λž˜μ— `Enter your authorization code:`κ³Ό 같은 λ©”μ‹œμ§€κ°€ 좜λ ₯될 경우, \n","같이 좜λ ₯된 링크에 μ ‘μ†ν•˜μ—¬, λ§ˆμš΄νŠΈν•  ꡬ글 계정을 μ„ νƒν•˜μ‹  ν›„, 인증 μ½”λ“œλ₯Ό λ³΅μ‚¬ν•˜μ—¬ μž…λ ₯ν•΄μ£Όμ„Έμš”."]},{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":230434,"status":"ok","timestamp":1707413202236,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"4U2wrDOthrsF","outputId":"1a56442b-2686-468a-bb32-0ba7608ef988"},"outputs":[{"name":"stdout","output_type":"stream","text":["Mounted at /content/drive\n"]}],"source":["from google.colab import drive\n","drive.mount('/content/drive')"]},{"cell_type":"markdown","metadata":{"id":"8erClGSnzwge"},"source":["## 2. ν•„μˆ˜ 라이브러리 및 ν•¨μˆ˜ 뢈러였기\n","\n","싀행에 ν•„μš”ν•œ 라이브러리 및 ν•¨μˆ˜λ₯Ό λΆˆλŸ¬μ˜΅λ‹ˆλ‹€.\n","\n","이 과정은 μ•½ 10λΆ„ 정도 μ†Œμš”λ  수 μžˆμŠ΅λ‹ˆλ‹€."]},{"cell_type":"code","execution_count":2,"metadata":{"executionInfo":{"elapsed":18,"status":"ok","timestamp":1707413202238,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"jYCym6hXge2_"},"outputs":[],"source":["import os\n","import sys\n","from pathlib import Path"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"background_save":true,"base_uri":"https://localhost:8080/"},"id":"JkWG-L13gReB"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content\n","fatal: destination path 'TTS' already exists and is not an empty directory.\n","fatal: destination path 'g2pK' already exists and is not an empty directory.\n","/content/TTS\n"," \u001b[1;31merror\u001b[0m: \u001b[1msubprocess-exited-with-error\u001b[0m\n"," \n"," \u001b[31mΓ—\u001b[0m \u001b[32mpip subprocess to install build dependencies\u001b[0m did not run successfully.\n"," \u001b[31mβ”‚\u001b[0m exit code: \u001b[1;36m1\u001b[0m\n"," \u001b[31m╰─\u003e\u001b[0m See above for output.\n"," \n"," \u001b[1;35mnote\u001b[0m: This error originates from a subprocess, and is likely not a problem with pip.\n"," Installing build dependencies ... \u001b[?25l\u001b[?25herror\n","\u001b[1;31merror\u001b[0m: \u001b[1msubprocess-exited-with-error\u001b[0m\n","\n","\u001b[31mΓ—\u001b[0m \u001b[32mpip subprocess to install build dependencies\u001b[0m did not run successfully.\n","\u001b[31mβ”‚\u001b[0m exit code: \u001b[1;36m1\u001b[0m\n","\u001b[31m╰─\u003e\u001b[0m See above for output.\n","\n","\u001b[1;35mnote\u001b[0m: This error originates from a subprocess, and is likely not a problem with pip.\n","/content/g2pK\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.1/71.1 kB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n"]}],"source":["%cd /content\n","!git clone --depth 1 https://github.com/sce-tts/TTS.git -b sce-tts\n","!git clone --depth 1 https://github.com/sce-tts/g2pK.git\n","%cd /content/TTS\n","!pip install -q --no-cache-dir -e .\n","%cd /content/g2pK\n","!pip install -q --no-cache-dir \"pysbd\" \"konlpy\" \"jamo\" \"nltk\" \"python-mecab-ko\"\n","!pip install -q --no-cache-dir -e ."]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":2893,"status":"ok","timestamp":1707413252303,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"FUD8SfIxSY8j","outputId":"3b984ef6-c788-4694-84b1-2675d82ed0c3"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/g2pK\n"]},{"name":"stderr","output_type":"stream","text":["[nltk_data] Downloading package cmudict to /root/nltk_data...\n","[nltk_data] Unzipping corpora/cmudict.zip.\n"]}],"source":["%cd /content/g2pK\n","import g2pk\n","g2p = g2pk.G2p()"]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":524},"executionInfo":{"elapsed":15,"status":"error","timestamp":1707413252303,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"Lt9bLLZ8I4GH","outputId":"d0eba754-a986-47ee-d154-7233c2ef24c6"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/TTS\n"]},{"ename":"ModuleNotFoundError","evalue":"No module named 'pysbd'","output_type":"error","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u003cipython-input-5-73a1c4bbdabf\u003e\u001b[0m in \u001b[0;36m\u003ccell line: 7\u003e\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mIPython\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----\u003e 7\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mTTS\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msynthesizer\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mSynthesizer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mnormalize_text\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/content/TTS/TTS/utils/synthesizer.py\u001b[0m in \u001b[0;36m\u003cmodule\u003e\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----\u003e 5\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mpysbd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'pysbd'","","\u001b[0;31m---------------------------------------------------------------------------\u001b[0;32m\nNOTE: If your import is failing due to a missing package, you can\nmanually install dependencies using either !pip or !apt.\n\nTo view examples of installing some common dependencies, click the\n\"Open Examples\" button below.\n\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n"]}],"source":["%cd /content/TTS\n","import re\n","import sys\n","from unicodedata import normalize\n","import IPython\n","\n","from TTS.utils.synthesizer import Synthesizer\n","\n","def normalize_text(text):\n"," text = text.strip()\n","\n"," for c in \",;:\":\n"," text = text.replace(c, \".\")\n"," text = remove_duplicated_punctuations(text)\n","\n"," text = jamo_text(text)\n","\n"," text = g2p.idioms(text)\n"," text = g2pk.english.convert_eng(text, g2p.cmu)\n"," text = g2pk.utils.annotate(text, g2p.mecab)\n"," text = g2pk.numerals.convert_num(text)\n"," text = re.sub(\"/[PJEB]\", \"\", text)\n","\n"," text = alphabet_text(text)\n","\n"," # remove unreadable characters\n"," text = normalize(\"NFD\", text)\n"," text = \"\".join(c for c in text if c in symbols)\n"," text = normalize(\"NFC\", text)\n","\n"," text = text.strip()\n"," if len(text) == 0:\n"," return \"\"\n","\n"," # only single punctuation\n"," if text in '.!?':\n"," return punctuation_text(text)\n","\n"," # append punctuation if there is no punctuation at the end of the text\n"," if text[-1] not in '.!?':\n"," text += '.'\n","\n"," return text\n","\n","\n","def remove_duplicated_punctuations(text):\n"," text = re.sub(r\"[.?!]+\\?\", \"?\", text)\n"," text = re.sub(r\"[.?!]+!\", \"!\", text)\n"," text = re.sub(r\"[.?!]+\\.\", \".\", text)\n"," return text\n","\n","\n","def split_text(text):\n"," text = remove_duplicated_punctuations(text)\n","\n"," texts = []\n"," for subtext in re.findall(r'[^.!?\\n]*[.!?\\n]', text):\n"," texts.append(subtext.strip())\n","\n"," return texts\n","\n","\n","def alphabet_text(text):\n"," text = re.sub(r\"(a|A)\", \"에이\", text)\n"," text = re.sub(r\"(b|B)\", \"λΉ„\", text)\n"," text = re.sub(r\"(c|C)\", \"씨\", text)\n"," text = re.sub(r\"(d|D)\", \"λ””\", text)\n"," text = re.sub(r\"(e|E)\", \"이\", text)\n"," text = re.sub(r\"(f|F)\", \"에프\", text)\n"," text = re.sub(r\"(g|G)\", \"μ₯\", text)\n"," text = re.sub(r\"(h|H)\", \"μ—μ΄μΉ˜\", text)\n"," text = re.sub(r\"(i|I)\", \"아이\", text)\n"," text = re.sub(r\"(j|J)\", \"제이\", text)\n"," text = re.sub(r\"(k|K)\", \"케이\", text)\n"," text = re.sub(r\"(l|L)\", \"μ—˜\", text)\n"," text = re.sub(r\"(m|M)\", \"μ— \", text)\n"," text = re.sub(r\"(n|N)\", \"μ—”\", text)\n"," text = re.sub(r\"(o|O)\", \"였\", text)\n"," text = re.sub(r\"(p|P)\", \"ν”Ό\", text)\n"," text = re.sub(r\"(q|Q)\", \"큐\", text)\n"," text = re.sub(r\"(r|R)\", \"μ•Œ\", text)\n"," text = re.sub(r\"(s|S)\", \"μ—μŠ€\", text)\n"," text = re.sub(r\"(t|T)\", \"ν‹°\", text)\n"," text = re.sub(r\"(u|U)\", \"유\", text)\n"," text = re.sub(r\"(v|V)\", \"브이\", text)\n"," text = re.sub(r\"(w|W)\", \"λ”λΈ”μœ \", text)\n"," text = re.sub(r\"(x|X)\", \"μ—‘μŠ€\", text)\n"," text = re.sub(r\"(y|Y)\", \"와이\", text)\n"," text = re.sub(r\"(z|Z)\", \"지\", text)\n","\n"," return text\n","\n","\n","def punctuation_text(text):\n"," # λ¬Έμž₯λΆ€ν˜Έ\n"," text = re.sub(r\"!\", \"λŠλ‚Œν‘œ\", text)\n"," text = re.sub(r\"\\?\", \"λ¬ΌμŒν‘œ\", text)\n"," text = re.sub(r\"\\.\", \"λ§ˆμΉ¨ν‘œ\", text)\n","\n"," return text\n","\n","\n","def jamo_text(text):\n"," # κΈ°λ³Έ 자λͺ¨μŒ\n"," text = re.sub(r\"γ„±\", \"κΈ°μ—­\", text)\n"," text = re.sub(r\"γ„΄\", \"λ‹ˆμ€\", text)\n"," text = re.sub(r\"γ„·\", \"λ””κ·Ώ\", text)\n"," text = re.sub(r\"γ„Ή\", \"리을\", text)\n"," text = re.sub(r\"ㅁ\", \"미음\", text)\n"," text = re.sub(r\"γ…‚\", \"비읍\", text)\n"," text = re.sub(r\"γ……\", \"μ‹œμ˜·\", text)\n"," text = re.sub(r\"γ…‡\", \"이응\", text)\n"," text = re.sub(r\"γ…ˆ\", \"지읒\", text)\n"," text = re.sub(r\"γ…Š\", \"μΉ˜μ“\", text)\n"," text = re.sub(r\"γ…‹\", \"킀읔\", text)\n"," text = re.sub(r\"γ…Œ\", \"티읕\", text)\n"," text = re.sub(r\"ㅍ\", \"피읖\", text)\n"," text = re.sub(r\"γ…Ž\", \"νžˆμ—\", text)\n"," text = re.sub(r\"γ„²\", \"μŒκΈ°μ—­\", text)\n"," text = re.sub(r\"γ„Έ\", \"μŒλ””κ·Ώ\", text)\n"," text = re.sub(r\"γ…ƒ\", \"μŒλΉ„μ\", text)\n"," text = re.sub(r\"γ…†\", \"μŒμ‹œμ˜·\", text)\n"," text = re.sub(r\"γ…‰\", \"μŒμ§€μ’\", text)\n"," text = re.sub(r\"γ„³\", \"κΈ°μ—­μ‹œμ˜·\", text)\n"," text = re.sub(r\"γ„΅\", \"λ‹ˆμ€μ§€μ’\", text)\n"," text = re.sub(r\"γ„Ά\", \"λ‹ˆμ€νžˆμ—\", text)\n"," text = re.sub(r\"γ„Ί\", \"리을기역\", text)\n"," text = re.sub(r\"γ„»\", \"λ¦¬μ„λ―ΈμŒ\", text)\n"," text = re.sub(r\"γ„Ό\", \"리을비읍\", text)\n"," text = re.sub(r\"γ„½\", \"λ¦¬μ„μ‹œμ˜·\", text)\n"," text = re.sub(r\"γ„Ύ\", \"리을티읕\", text)\n"," text = re.sub(r\"γ„Ώ\", \"리을피읍\", text)\n"," text = re.sub(r\"γ…€\", \"λ¦¬μ„νžˆμ—\", text)\n"," text = re.sub(r\"γ…„\", \"λΉ„μμ‹œμ˜·\", text)\n"," text = re.sub(r\"ㅏ\", \"μ•„\", text)\n"," text = re.sub(r\"γ…‘\", \"μ•Ό\", text)\n"," text = re.sub(r\"γ…“\", \"μ–΄\", text)\n"," text = re.sub(r\"γ…•\", \"μ—¬\", text)\n"," text = re.sub(r\"γ…—\", \"였\", text)\n"," text = re.sub(r\"γ…›\", \"μš”\", text)\n"," text = re.sub(r\"γ…œ\", \"우\", text)\n"," text = re.sub(r\"γ… \", \"유\", text)\n"," text = re.sub(r\"γ…‘\", \"으\", text)\n"," text = re.sub(r\"γ…£\", \"이\", text)\n"," text = re.sub(r\"ㅐ\", \"μ• \", text)\n"," text = re.sub(r\"γ…’\", \"μ–˜\", text)\n"," text = re.sub(r\"γ…”\", \"에\", text)\n"," text = re.sub(r\"γ…–\", \"예\", text)\n"," text = re.sub(r\"γ…˜\", \"와\", text)\n"," text = re.sub(r\"γ…™\", \"μ™œ\", text)\n"," text = re.sub(r\"γ…š\", \"μ™Έ\", text)\n"," text = re.sub(r\"ㅝ\", \"μ›Œ\", text)\n"," text = re.sub(r\"γ…ž\", \"웨\", text)\n"," text = re.sub(r\"γ…Ÿ\", \"μœ„\", text)\n"," text = re.sub(r\"γ…’\", \"의\", text)\n","\n"," return text\n","\n","\n","def normalize_multiline_text(long_text):\n"," texts = split_text(long_text)\n"," normalized_texts = [normalize_text(text).strip() for text in texts]\n"," return [text for text in normalized_texts if len(text) \u003e 0]\n","\n","def synthesize(text):\n"," wavs = synthesizer.tts(text, None, None)\n"," return wavs"]},{"cell_type":"markdown","metadata":{"id":"SbPRQfl8z28u"},"source":["## 3. ν•™μŠ΅ν•œ λͺ¨λΈ 뢈러였기\n","\n","ν•™μŠ΅ν•œ Glow-TTS와 HiFi-GAN λͺ¨λΈμ„ λΆˆλŸ¬μ˜΅λ‹ˆλ‹€.\n","\n","λ§Œμ•½ λ‹€λ₯Έ μ²΄ν¬ν¬μΈνŠΈμ—μ„œ λΆˆλŸ¬μ˜€μ‹œλ €λ©΄ μ•„λž˜ μ½”λ“œμ—μ„œ 경둜λ₯Ό μ•„λž˜μ™€ 같이 μ μ ˆν•˜κ²Œ μˆ˜μ •ν•©λ‹ˆλ‹€.\n","\n","```python\n","synthesizer = Synthesizer(\n"," \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-May-31-2021_08+17AM-d897f2e/best_model.pth.tar\",\n"," \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-May-31-2021_08+17AM-d897f2e/config.json\",\n"," None,\n"," \"/content/drive/My Drive/Colab Notebooks/data/hifigan-v2/hifigan-v2-May-31-2021_08+26AM-d897f2e/checkpoint_300000.pth.tar\",\n"," \"/content/drive/My Drive/Colab Notebooks/data/hifigan-v2/hifigan-v2-May-31-2021_08+26AM-d897f2e/config.json\",\n"," None,\n"," None,\n"," False,\n",")\n","```"]},{"cell_type":"code","execution_count":null,"metadata":{"executionInfo":{"elapsed":10,"status":"aborted","timestamp":1707413252304,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"zwROk8zUHgUn"},"outputs":[],"source":["synthesizer = Synthesizer(\n"," \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-April-17-2022_04+46AM-3aa165a/best_model.pth.tar\",\n"," \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-April-17-2022_04+46AM-3aa165a/config.json\",\n"," None,\n"," \"/content/drive/My Drive/Colab Notebooks/data/hifigan-v2/hifigan-v2-April-16-2022_11+57AM-3aa165a/best_model.pth.tar\",\n"," \"/content/drive/My Drive/Colab Notebooks/data/hifigan-v2/hifigan-v2-April-16-2022_11+57AM-3aa165a/config.json\",\n"," None,\n"," None,\n"," False,\n",")\n","symbols = synthesizer.tts_config.characters.characters"]},{"cell_type":"markdown","metadata":{"id":"tmjT_BrV0XYD"},"source":["## 4. μŒμ„± ν•©μ„±\n","\n","μ‹€μ œ μŒμ„± 합성을 μˆ˜ν–‰ν•©λ‹ˆλ‹€."]},{"cell_type":"code","execution_count":null,"metadata":{"executionInfo":{"elapsed":11,"status":"aborted","timestamp":1707413252305,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"XSnF1D48F1tx"},"outputs":[],"source":["texts = \"\"\"\n","\n","\"\"\"\n","for text in normalize_multiline_text(texts):\n"," wav = synthesizer.tts(text, None, None)\n"," IPython.display.display(IPython.display.Audio(wav, rate=22050))"]}],"metadata":{"colab":{"collapsed_sections":["8erClGSnzwge"],"name":"","provenance":[{"file_id":"1YkxjzBz3V4eXoAaEgcFNEUg8ZyWV40x9","timestamp":1650109450650},{"file_id":"13pqat2mWsMha7Vn_-Q5_Ih8MDkvz3q5a","timestamp":1622375316346},{"file_id":"1IlZt42ETvNHthRFXfwNSSH-ftWthxzqr","timestamp":1596336131977},{"file_id":"1UinTd1Kp1ytwPQ4QWA610ZKOVfmPDdn5","timestamp":1596300568469}],"version":""},"kernelspec":{"display_name":"Python 3","name":"python3"}},"nbformat":4,"nbformat_minor":0}
train-glowtts-v2.ipynb ADDED
@@ -0,0 +1 @@
 
 
1
+ {"cells":[{"cell_type":"markdown","metadata":{"id":"yIgz4PyCC9eY"},"source":["# Glow-TTS Training\n","Glow-TTS ν•™μŠ΅ 진행\n","μŒμ„±μ˜ λ§νˆ¬μ™€ μŒμƒ‰μ„ κ²°μ •ν•˜λŠ” λͺ¨λΈ"]},{"cell_type":"markdown","metadata":{"id":"nMJBiJ6mECO1"},"source":["## 1. λŸ°νƒ€μž„μ— ν• λ‹Ήλœ GPU 확인\n","\n","λ§Œμ•½, `GPU: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.` λΌλŠ” λ©”μ‹œμ§€κ°€ 좜λ ₯λœλ‹€λ©΄, μœ„μͺ½ λ©”λ‰΄μ—μ„œ `λŸ°νƒ€μž„ -> λŸ°νƒ€μž„ μœ ν˜• λ³€κ²½`을 ν΄λ¦­ν•˜κ³  ν•˜λ“œμ›¨μ–΄ 가속기λ₯Ό `GPU`둜 λ³€κ²½ν•˜μ—¬ μ €μž₯ν•œ ν›„ λ‹€μ‹œ μ‹€ν–‰ ν•„μš”"]},{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":742,"status":"ok","timestamp":1685149753708,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"pHat88bRD4_e","outputId":"ae42575e-4d76-4b02-d10f-8f7d855819b8"},"outputs":[{"name":"stdout","output_type":"stream","text":["GPU: Tesla T4\n"]}],"source":["import os\n","GPU_NAME = os.popen('nvidia-smi --query-gpu=name --format=csv,noheader').read().strip()\n","os.environ['GPU_NAME'] = GPU_NAME\n","print(f'GPU: {GPU_NAME}')"]},{"cell_type":"markdown","metadata":{"id":"CEgknkRoDKj9"},"source":["## 2. ꡬ글 λ“œλΌμ΄λΈŒ 마운트\n","\n","λ§ˆμš΄νŠΈν•  ꡬ글 λ“œλΌμ΄λΈŒ 내에 λ‹€μŒ 파일이 μ‘΄μž¬ν•΄μ•Όν•¨\n","\n","- `/Colab Notebooks/data/filelists.zip`"]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":24329,"status":"ok","timestamp":1685149778033,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"4U2wrDOthrsF","outputId":"2f3187bb-05e1-48f3-95f0-905f395f81e3"},"outputs":[{"name":"stdout","output_type":"stream","text":["Mounted at /content/drive\n"]}],"source":["from google.colab import drive\n","drive.mount('/content/drive')"]},{"cell_type":"markdown","metadata":{"id":"IxpzRw3SDvOL"},"source":["## 3. ν•„μˆ˜ 라이브러리 및 ν•¨μˆ˜ 뢈러였기\n"]},{"cell_type":"code","execution_count":3,"metadata":{"executionInfo":{"elapsed":6,"status":"ok","timestamp":1685149778033,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"jYCym6hXge2_"},"outputs":[],"source":["import sys\n","from pathlib import Path"]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":2765,"status":"ok","timestamp":1685149780793,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"JkWG-L13gReB","outputId":"c820f3d3-43d7-436a-bd25-1f8c1ee72148"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content\n","Cloning into 'TTS'...\n","remote: Enumerating objects: 447, done.\u001b[K\n","remote: Counting objects: 100% (447/447), done.\u001b[K\n","remote: Compressing objects: 100% (413/413), done.\u001b[K\n","remote: Total 447 (delta 56), reused 222 (delta 22), pack-reused 0\u001b[K\n","Receiving objects: 100% (447/447), 13.77 MiB | 17.84 MiB/s, done.\n","Resolving deltas: 100% (56/56), done.\n","/content/TTS\n","/content/TTS/setup.py:15: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n"," if LooseVersion(sys.version) < LooseVersion(\"3.6\") or LooseVersion(sys.version) > LooseVersion(\"3.9\"):\n","Traceback (most recent call last):\n"," File \"/content/TTS/setup.py\", line 16, in <module>\n"," raise RuntimeError(\n","RuntimeError: TTS requires python >= 3.6 and <3.9 but your Python version is 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0]\n"]}],"source":["%cd /content\n","!git clone --depth 1 https://github.com/sce-tts/TTS.git -b sce-tts\n","%cd /content/TTS\n","!python setup.py develop"]},{"cell_type":"markdown","metadata":{"id":"iiXsxJtZERyP"},"source":["## 4. ν•™μŠ΅ν•  데이터셋 뢈러였기\n","\n","ν•™μŠ΅μ— μ‚¬μš©ν•  μŒμ„± 데이터λ₯Ό ꡬ글 λ“œλΌμ΄λΈŒμ—μ„œ κ°€μ Έμ˜¨λ‹€.\n","\n","mimic recording studio둜 μƒμ„±ν•œ `filelists.zip` ν•„μš”"]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":5166,"status":"ok","timestamp":1685149785956,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"hExnC_2RhZ3m","outputId":"ee65017a-9473-4ec8-f676-0ee937202b6b"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/TTS\n"]}],"source":["%cd /content/TTS\n","!cp \"/content/drive/My Drive/Colab Notebooks/data/filelists.zip\" ./filelists.zip\n","!rm -rf ./filelists\n","!unzip -q filelists.zip -d ./filelists"]},{"cell_type":"markdown","metadata":{"id":"qD8zd4SMElbn"},"source":["## 5. 사전 ν•™μŠ΅ 데이터 뢈러였기\n","\n","\n","> 사전 ν•™μŠ΅ 데이터가 ꡬ글 λ“œλΌμ΄λΈŒμ— μ‘΄μž¬ν•˜μ§€ μ•Šμ„ 경우, λ‹€λ₯Έ μ‚¬λžŒμ˜ 사전 ν•™μŠ΅ 데이터λ₯Ό λ‚΄λ €λ°›μŒ.\n","\n"]},{"cell_type":"code","execution_count":6,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":32,"status":"ok","timestamp":1685149785960,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"MQxazTNayds-","outputId":"c3cf0aa3-b69e-479f-92b0-e51c93f541fa"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/TTS\n"]}],"source":["%cd /content/TTS\n","!mkdir -p \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2\"\n","if not Path(\"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/config.json\").exists():\n"," !gdown --id 1DMKLdfZ_gzc_z0qDod6_G8fEXj0zCHvC -O glowtts-v2.zip\n"," !unzip -q glowtts-v2.zip -d ./\n"," !cp -R ./glowtts-v2/* \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/\""]},{"cell_type":"code","execution_count":7,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":29,"status":"ok","timestamp":1685149785961,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"c73wfE5L7uK9","outputId":"859725c8-5287-4b3e-f6d1-bfab3772e250"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/TTS\n"]}],"source":["%cd /content/TTS\n","if not Path(\"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/scale_stats_new.npy\").exists():\n"," !python TTS/bin/compute_statistics.py \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/config.json\" \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/scale_stats_new.npy\" --data_path \"/content/TTS/filelists/wavs/\""]},{"cell_type":"code","execution_count":8,"metadata":{"executionInfo":{"elapsed":26,"status":"ok","timestamp":1685149785961,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"Q6TCF3Pu-MnV"},"outputs":[],"source":["with open(\"/content/TTS/test_sentences.txt\", mode=\"w\") as f:\n"," f.write(\"\"\"μ•„λž˜ λ¬Έμž₯듀은 λͺ¨λΈ ν•™μŠ΅μ„ μœ„ν•΄ μ‚¬μš©ν•˜μ§€ μ•Šμ€ λ¬Έμž₯λ“€μž…λ‹ˆλ‹€.\n","μ„œμšΈνŠΉλ³„μ‹œ νŠΉν—ˆν—ˆκ°€κ³Ό ν—ˆκ°€κ³Όμž₯ ν—ˆκ³Όμž₯.\n","κ²½μ°°μ²­ 철창살은 외철창살이고 κ²€μ°°μ²­ 철창살은 μŒμ² μ°½μ‚΄μ΄λ‹€.\n","지ν–₯을 μ§€μ–‘μœΌλ‘œ μ˜€κΈ°ν•˜λŠ” 일을 μ§€μ–‘ν•˜λŠ” μ–Έμ–΄ μŠ΅κ΄€μ„ 지ν–₯ν•΄μ•Ό ν•œλ‹€.\n","κ·ΈλŸ¬λ‹ˆκΉŒ 외계인이 우리 생각을 읽고 우리 생각을 μš°λ¦¬κ°€ λ‹€μ‹œ 생각토둝 ν•΄μ„œ κ·Έ 생각이 마치 μš°λ¦¬κ°€ μƒκ°ν•œ 것인 κ²ƒμ²˜λŸΌ μ†μ˜€λ‹€λŠ” 거냐?\"\"\")"]},{"cell_type":"markdown","metadata":{"id":"alQe2KpbE9di"},"source":["## 6. TensorBoard μ‹€ν–‰\n","\n","ν•™μŠ΅ 진행을 ν™•μΈν•˜κΈ° μœ„ν•΄ TensorBoardλ₯Ό μ‹€ν–‰\n","\n","μ„€μ • λ²„νŠΌμ„ 눌러 auto reloadλ₯Ό μ„€μ •ν•΄ 30μ΄ˆλ§ˆλ‹€ μžλ™ κ°±μ‹  κ°€λŠ₯"]},{"cell_type":"code","execution_count":9,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":820},"executionInfo":{"elapsed":5674,"status":"ok","timestamp":1685149791608,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"ydwAZhCQilzJ","outputId":"a3296d49-99d3-4883-9939-ce14229a6367"},"outputs":[{"data":{"application/javascript":"\n (async () => {\n const url = new URL(await google.colab.kernel.proxyPort(6006, {'cache': true}));\n url.searchParams.set('tensorboardColab', 'true');\n const iframe = document.createElement('iframe');\n iframe.src = url;\n iframe.setAttribute('width', '100%');\n iframe.setAttribute('height', '800');\n iframe.setAttribute('frameborder', 0);\n document.body.appendChild(iframe);\n })();\n ","text/plain":["<IPython.core.display.Javascript object>"]},"metadata":{},"output_type":"display_data"}],"source":["%load_ext tensorboard\n","%tensorboard --logdir=\"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2\""]},{"cell_type":"markdown","metadata":{"id":"32XUNFa-FQ-R"},"source":["## 7. Glow-TTS ν•™μŠ΅ 진행\n","\n","ν•™μŠ΅μ΄ μ •μƒμ μœΌλ‘œ μ§„ν–‰λ˜λ©΄, 이 셀은 μ’…λ£Œλ˜μ§€ μ•Šκ³  계속 μ‹€ν–‰λ˜λŠ” μƒνƒœλ₯Ό μœ μ§€ν•œλ‹€.\n","\n","이전에 ν•™μŠ΅μ„ μ§„ν–‰ν•˜λ˜ λͺ¨λΈμ„ μ΄μ–΄μ„œ ν•™μŠ΅μ„ μ§„ν–‰ν•˜μ‹œλ €λ©΄ λ‹€μŒκ³Ό 같이 μˆ˜μ •ν•œ ν›„ μ‹€ν–‰ν•œλ‹€.\n","\n","- μ•„λž˜ μ…€μ—μ„œ 2 ~ 3번째 μ€„μ˜ μ½”λ“œλ₯Ό 주석을 ν•΄μ œν•œλ‹€λ‹€\n","- 3번째 μ€„μ˜ 경둜λ₯Ό μ΄μ–΄μ„œ ν•™μŠ΅μ„ 진행할 λͺ¨λΈμ˜ 경둜둜 λ³€κ²½ν•œλ‹€λ‹€. \n","(μ˜ˆμ‹œ: `/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-May-31-2021_08+17AM-d897f2e`)\n","- 4번째 쀄 μ•„λž˜μ˜ μ½”λ“œλ₯Ό μ œκ±°ν•œλ‹€.\n"]},{"cell_type":"code","execution_count":10,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":8529,"status":"ok","timestamp":1685149800131,"user":{"displayName":"Cream Ice","userId":"02668969734157440879"},"user_tz":-540},"id":"9Yim0zgJk3cR","outputId":"831e6d6a-c0c3-4109-f266-031d0750c13a"},"outputs":[{"name":"stdout","output_type":"stream","text":["/content/TTS\n","Traceback (most recent call last):\n"," File \"/content/TTS/TTS/bin/train_glow_tts.py\", line 17, in <module>\n"," from TTS.tts.datasets.preprocess import load_meta_data\n","ModuleNotFoundError: No module named 'TTS'\n","Traceback (most recent call last):\n"," File \"/content/TTS/TTS/bin/train_glow_tts.py\", line 17, in <module>\n"," from TTS.tts.datasets.preprocess import load_meta_data\n","ModuleNotFoundError: No module named 'TTS'\n"]}],"source":["%cd /content/TTS\n","!(python TTS/bin/train_glow_tts.py \\\n"," -continue_path \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-April-17-2022_04+46AM-3aa165a/checkpoint_32000.pth.tar\")\n","!(python TTS/bin/train_glow_tts.py \\\n"," --config_path \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/glowtts-v2-April-17-2022_04+46AM-3aa165a/config.json\" \\\n"," --coqpit.datasets.0.path \"/content/TTS/filelists\" \\\n"," --coqpit.audio.stats_path \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/scale_stats_new.npy\" \\\n"," --coqpit.test_sentences_file \"/content/TTS/test_sentences.txt\" \\\n"," --coqpit.output_path \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/\" \\\n"," --coqpit.num_loader_workers 2 \\\n"," --coqpit.num_val_loader_workers 2 \\\n"," --restore_path \"/content/drive/My Drive/Colab Notebooks/data/glowtts-v2/model_file.pth.tar\")"]}],"metadata":{"accelerator":"GPU","colab":{"provenance":[{"file_id":"1L5o8joH8LDV37eupNUpqqWrOcw1sGCit","timestamp":1650106813939},{"file_id":"1IlZt42ETvNHthRFXfwNSSH-ftWthxzqr","timestamp":1622371446894},{"file_id":"1UinTd1Kp1ytwPQ4QWA610ZKOVfmPDdn5","timestamp":1596300568469}]},"kernelspec":{"display_name":"Python 3","name":"python3"}},"nbformat":4,"nbformat_minor":0}
train-hifigan-v2.ipynb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6b5aa7fd7c485d9800f1e446b96a510a2a15998f419408339ccfbebd86aaf53
3
+ size 11867103