94insane commited on
Commit
75f9c7a
โ€ข
1 Parent(s): 320e69e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -134
README.md CHANGED
@@ -1,19 +1,13 @@
 
 
 
 
 
1
 
2
  # Korean FastSpeech 2 - Pytorch Implementation
3
 
4
- ![](./assets/model.png)
5
- # Introduction
6
 
7
- Fastspeech2๋Š” ๊ธฐ์กด์˜ ์ž๊ธฐํšŒ๊ท€(Autoregressive) ๊ธฐ๋ฐ˜์˜ ๋Š๋ฆฐ ํ•™์Šต ๋ฐ ํ•ฉ์„ฑ ์†๋„๋ฅผ ๊ฐœ์„ ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋น„์ž๊ธฐํšŒ๊ท€(Non Autoregressive) ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ๋กœ, Variance Adaptor์—์„œ ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ๋“ค์„ ํ†ตํ•ด, speech ์˜ˆ์ธก์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
8
- ์ฆ‰ ๊ธฐ์กด์˜ audio-text๋งŒ์œผ๋กœ ์˜ˆ์ธก์„ ํ•˜๋Š” ๋ชจ๋ธ์—์„œ, pitch,energy,duration์„ ์ถ”๊ฐ€ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
9
- Fastspeech2์—์„œ duration์€ MFA(Montreal Forced Aligner)๋ฅผ ํ†ตํ•ด ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ถ”์ถœํ•œ duration์„ ๋ฐ”ํƒ•์œผ๋กœ phoneme(์Œ์†Œ)์™€ ์Œ์„ฑ๊ฐ„์˜ alignment๊ฐ€ ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.
10
-
11
-
12
- * This Repository๋Š” https://github.com/HGU-DLLAB/Korean-FastSpeech2-Pytorch
13
-
14
-
15
-
16
- # Install Dependencies
17
  python=3.9,
18
  [pytorch](https://pytorch.org/)=1.13, [ffmpeg](https://ffmpeg.org/) [g2pk](https://github.com/Kyubyong/g2pK)
19
  ```
@@ -23,129 +17,13 @@ pip install g2pk
23
  pip install -r requirements.txt
24
  ```
25
 
26
- # Preprocessing
27
-
28
- ### Step 1
29
- MFA(Montreal Forced Aligner)๋Š” Fastspeech2 ํ•™์Šต์— ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•œ, Duration์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. MFA๋Š” ๋ฐœํ™”(์Œ์„ฑ ํŒŒ์ผ)์™€ Phoneme sequence๊ฐ„์˜ alignment๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์ด๋ฅผ TextGrid๋ผ๋Š” ํŒŒ์ผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
30
-
31
-
32
- 1. wav-lab pair ์ƒ์„ฑ
33
-
34
- wavํŒŒ์ผ๊ณผ ๊ทธ wavํŒŒ์ผ์˜ ๋ฐœํ™”๋ฅผ transcriptํ•œ labํŒŒ์ผ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
35
-
36
-
37
- ํ•ด๋‹น ํ•จ์ˆ˜๋Š” metadata๋กœ ๋ถ€ํ„ฐ wavํŒŒ์ผ๊ณผ text๋ฅผ ์ธ์‹ํ•˜์—ฌ, wavํŒŒ์ผ๊ณผ ํ™•์žฅ์ž๋งŒ ๋‹ค๋ฅธ transcriptํŒŒ์ผ(.lab) ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
38
-
39
-
40
- ![์บก์ฒ˜1](https://user-images.githubusercontent.com/63226383/117935760-0568d500-b33f-11eb-857e-6024ed7a5421.PNG)
41
-
42
- ์ž‘์—…์ด ๋๋‚˜๋ฉด ์œ„์˜ ํ˜•ํƒœ์™€ ๊ฐ™์ด wav-lab pair๊ฐ€ ๋งŒ๋“ค์–ด์ ธ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
43
-
44
-
45
- 2. lexicon ํŒŒ์ผ ์ƒ์„ฑ
46
-
47
- ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹ ๋‚ด์˜ ๋ชจ๋“  ๋ฐœํ™”์— ๋Œ€ํ•œ, phoneme์„ ๊ธฐ๋กํ•œ lexicon ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
48
-
49
- [processing_utils.ipynb](https://github.com/JH-lee95/Fastspeech2-Korean/blob/master/processing_utils.ipynb) ๋…ธํŠธ๋ถ ๋‚ด์˜ make_p_dict ์™€ make_lexicon ํ•จ์ˆ˜๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์‹คํ–‰ํ•ด์ฃผ์„ธ์š”.
50
-
51
- ![1](https://user-images.githubusercontent.com/63226383/117945618-7614ef00-b349-11eb-8e54-8d1a98bc0dab.PNG)
52
-
53
- ์ž‘์—…์ด ๋๋‚˜๋ฉด ์œ„์™€ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๋„๋Š” p_lexicon.txt ํŒŒ์ผ์ด ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.
54
-
55
-
56
- 3. MFA ์„ค์น˜
57
-
58
- * MFA์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค์น˜ ๋ฐฉ๋ฒ•์€ https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html ์ด๊ณณ์„ ํ™•์ธํ•ด์ฃผ์„ธ์š”.
59
-
60
-
61
- 4. MFA ์‹คํ–‰
62
-
63
- MFA์˜ ๊ฒฝ์šฐ pre-trained๋œ ํ•œ๊ตญ์–ด acoustic model๊ณผ g2p ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ํ•ด๋‹น ๋ชจ๋ธ์€ english phoneme์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•œ๊ตญ์–ด phoneme์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ง์ ‘ train์„ ์‹œ์ผœ์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
64
-
65
- MFA ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ์ปค๋ฉ˜๋“œ๋ฅผ ์‹คํ–‰ํ•ด์ฃผ์„ธ์š”.
66
-
67
- ```
68
- mfa train <๋ฐ์ดํ„ฐ์…‹ ์œ„์น˜> <p_lexicon์˜ ์œ„์น˜> <out directory>
69
- ```
70
-
71
- MFA๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์‹คํ–‰๋˜์—ˆ์„ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ์˜ TextGrid ํŒŒ์ผ์ด ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.
72
- ![์บก์ฒ˜](https://user-images.githubusercontent.com/63226383/117936797-3d244c80-b340-11eb-89d0-699f3499e8e8.PNG)
73
-
74
-
75
-
76
- **(3) ๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ**
77
-
78
- 1.hparms.py
79
-
80
- - dataset : ๋ฐ์ดํ„ฐ์…‹ ํด๋”๋ช…
81
- - data_path : dataset์˜ ์ƒ์œ„ ํด๋”
82
- - meta_name : metadata์˜ ํŒŒ์ผ๋ช… ex)transcript.v.1.4.txt
83
- - textgrid_path : textgrid ์••์ถ• ํŒŒ์ผ์˜ ์œ„์น˜ (textgrid ํŒŒ์ผ๋“ค์„ ๋ฏธ๋ฆฌ ์••์ถ•ํ•ด์ฃผ์„ธ์š”)
84
- - tetxgrid_name : textgird ์••ํ‘น ํŒŒ์ผ์˜ ํŒŒ์ผ๋ช…
85
-
86
- 2. preprocess.py
87
-
88
- ![์บก์ฒ˜](https://user-images.githubusercontent.com/63226383/117941734-58458b00-b345-11eb-9fa8-47fc74c7a844.PNG)
89
-
90
- ํ•ด๋‹น ๋ถ€๋ถ„์„ ๋ณธ์ธ์˜ ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„์— ๋งž๊ฒŒ ๋ณ€๊ฒฝํ•ด์ฃผ์„ธ์š”
91
-
92
-
93
- 3. data/kss.py
94
-
95
- - line 19 : basename,text = parts[?],parts[?] #๊ฐ๊ฐ ํ…์ŠคํŠธ์˜ ์œ„์น˜ ("|")๋กœ splitํ–ˆ์„๋•Œ, metadata์— ๊ธฐ๋ก๋œ wav์™€ text์˜ ์œ„์น˜
96
- - line 37 : basename,text = parts[?],parts[?]
97
-
98
-
99
- ์œ„์˜ ๋ณ€๊ฒฝ ์ž‘์—…์ด ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด ์•„๋ž˜์˜ ์ปค๋ฉ˜๋“œ๋ฅผ ์‹คํ–‰ํ•ด์ฃผ์„ธ์š”.
100
-
101
- ```
102
- python preprocess.py
103
- ```
104
-
105
- # Train
106
- ๋ชจ๋ธ ํ•™์Šต ์ „์—, kss dataset์— ๋Œ€ํ•ด ์‚ฌ์ „ํ•™์Šต๋œ VocGAN(neural vocoder)์„ [๋‹ค์šด๋กœ๋“œ](https://drive.google.com/file/d/1GxaLlTrEhq0aXFvd_X1f4b-ev7-FH8RB/view?usp=sharing) ํ•˜์—ฌ ``vocoder/pretrained_models/`` ๊ฒฝ๋กœ์— ์œ„์น˜์‹œํ‚ต๋‹ˆ๋‹ค.
107
-
108
- ๋‹ค์Œ์œผ๋กœ, ์•„๋ž˜์˜ ์ปค๋งจ๋“œ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
109
- ```
110
- python train.py
111
- ```
112
- ํ•™์Šต๋œ ๋ชจ๋ธ์€ ``ckpt/``์— ์ €์žฅ๋˜๊ณ  tensorboard log๋Š” ``log/``์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต์‹œ evaluate ๊ณผ์ •์—์„œ ์ƒ์„ฑ๋œ ์Œ์„ฑ์€ ``eval/`` ํด๋”์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
113
-
114
- # Synthesis
115
- ํ•™์Šต๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์Œ์„ฑ์„ ์ƒ์„ฑํ•˜๋Š” ๋ช…๋ น์–ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
116
- ```
117
- python synthesis.py --step 500000
118
- ```
119
- ํ•ฉ์„ฑ๋œ ์Œ์„ฑ์€ ```results/``` directory์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
120
-
121
- # Pretrained model
122
- pretrained model(checkpoint)์„ [๋‹ค์šด๋กœ๋“œ](https://drive.google.com/file/d/1qkFuNLqPIm-A5mZZDPGK1mnp0_Lh00PN/view?usp=sharing)ํ•ด ์ฃผ์„ธ์š”.
123
- ๊ทธ ํ›„, ```hparams.py```์— ์žˆ๋Š” ```checkpoint_path``` ๋ณ€์ˆ˜์— ๊ธฐ๋ก๋œ ๊ฒฝ๋กœ์— ์œ„์น˜์‹œ์ผœ์ฃผ์‹œ๋ฉด ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
124
-
125
-
126
- # Fine-Tuning
127
- Pretrained model์„ ํ™œ์šฉํ•˜์—ฌ Fine-tuning์„ ํ•  ๊ฒฝ์šฐ, ์ตœ์†Œ 30๋ถ„ ์ด์ƒ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. 10๋ถ„ ์ •๋„ ๋ถ„๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋กœ ์‹คํ—˜์‹œ ๋ชฉ์†Œ๋ฆฌ์™€ ๋ฐœ์Œ์€ ๋Œ€์ฒด์ ์œผ๋กœ ๋น„์Šทํ•˜๊ฒŒ ๋”ฐ๋ผํ•˜๋‚˜ ๋…ธ์ด์ฆˆ๊ฐ€ ์‹ฌํ–ˆ์Šต๋‹ˆ๋‹ค.
128
-
129
- Fine-tuning ์‹œ, Learning Rate์˜ ์กฐ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Learning Rate๋Š” ์ ๋‹นํžˆ ๋‚ฎ์€ ๊ฐ’์ด ํ•„์š”ํ•˜๋ฉฐ, ์ด๋Š” ๊ฒฝํ—˜์ ์œผ๋กœ ์•Œ์•„๋‚ด์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค. (์ €๋Š” ์ตœ์ข… step์—์„œ์˜ Learning Rate๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.)
130
-
131
- ```
132
- python train.py --restore_step 350000
133
- ```
134
-
135
-
136
-
137
- # Tensorboard
138
- ```
139
- tensorboard --logdir log/hp.dataset/
140
- ```
141
- tensorboard log๋“ค์€ ```log/hp.dataset/``` directory์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์œ„์˜ ์ปค๋ฉ˜๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ tensorboard๋ฅผ ์‹คํ–‰ํ•ด ํ•™์Šต ์ƒํ™ฉ์„ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
142
-
143
-
144
 
145
  # References
146
  - [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558), Y. Ren, *et al*.
147
- - [FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263), Y. Ren, *et al*.
148
- - [ming024's FastSpeech2 impelmentation](https://github.com/ming024/FastSpeech2)
149
- - [rishikksh20's VocGAN implementation](https://github.com/rishikksh20/VocGAN)
150
  - [HGU-DLLAB](https://github.com/HGU-DLLAB/Korean-FastSpeech2-Pytorch)
151
- - [TensorSpeech](https://github.com/TensorSpeech/TensorFlowTTS)
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ko
5
+ ---
6
 
7
  # Korean FastSpeech 2 - Pytorch Implementation
8
 
 
 
9
 
10
+ # Dependencies
 
 
 
 
 
 
 
 
 
11
  python=3.9,
12
  [pytorch](https://pytorch.org/)=1.13, [ffmpeg](https://ffmpeg.org/) [g2pk](https://github.com/Kyubyong/g2pK)
13
  ```
 
17
  pip install -r requirements.txt
18
  ```
19
 
20
+ # Useage
21
+ Data propress
22
+ Train VocGAN model
23
+ Train Fastspeech2 model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  # References
26
  - [FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558), Y. Ren, *et al*.
 
 
 
27
  - [HGU-DLLAB](https://github.com/HGU-DLLAB/Korean-FastSpeech2-Pytorch)
28
+ - [rishikksh20's VocGAN implementation](https://github.com/rishikksh20/VocGAN)
29
+