Siddhant commited on
Commit
2da0a6b
1 Parent(s): 4a56606

import from zenodo

Browse files
Files changed (32) hide show
  1. README.md +50 -0
  2. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/config.yaml +449 -0
  3. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/adv_loss.png +0 -0
  4. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png +0 -0
  5. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png +0 -0
  6. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png +0 -0
  7. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png +0 -0
  8. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png +0 -0
  9. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/duration_loss.png +0 -0
  10. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/energy_loss.png +0 -0
  11. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/fake_loss.png +0 -0
  12. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/feat_match_loss.png +0 -0
  13. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png +0 -0
  14. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png +0 -0
  15. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png +0 -0
  16. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png +0 -0
  17. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png +0 -0
  18. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png +0 -0
  19. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/l1_loss.png +0 -0
  20. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/loss.png +0 -0
  21. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/mel_loss.png +0 -0
  22. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png +0 -0
  23. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png +0 -0
  24. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/pitch_loss.png +0 -0
  25. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/real_loss.png +0 -0
  26. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/text2mel_loss.png +0 -0
  27. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/train_time.png +0 -0
  28. exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_5best.pth +3 -0
  29. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz +0 -0
  30. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz +0 -0
  31. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz +0 -0
  32. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - text-to-speech
6
+ language: en
7
+ datasets:
8
+ - ljspeech
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 TTS pretrained model
12
+ ### `kan-bayashi/ljspeech_joint_finetune_conformer_fastspeech2_hifigan`
13
+ ♻️ Imported from https://zenodo.org/record/5498896/
14
+
15
+ This model was trained by kan-bayashi using ljspeech/tts1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+ @inproceedings{hayashi2020espnet,
32
+ title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
33
+ author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
34
+ booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
35
+ pages={7654--7658},
36
+ year={2020},
37
+ organization={IEEE}
38
+ }
39
+ ```
40
+ or arXiv:
41
+ ```bibtex
42
+ @misc{watanabe2018espnet,
43
+ title={ESPnet: End-to-End Speech Processing Toolkit},
44
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
45
+ year={2018},
46
+ eprint={1804.00015},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/config.yaml ADDED
@@ -0,0 +1,449 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: ./conf/tuning/finetune_joint_conformer_fastspeech2_hifigan.v8.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space
7
+ ngpu: 1
8
+ seed: 777
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 51963
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 1000
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - text2mel_loss
39
+ - min
40
+ - - train
41
+ - text2mel_loss
42
+ - min
43
+ - - train
44
+ - total_count
45
+ - max
46
+ keep_nbest_models: 5
47
+ grad_clip: -1
48
+ grad_clip_type: 2.0
49
+ grad_noise: false
50
+ accum_grad: 1
51
+ no_forward_run: false
52
+ resume: true
53
+ train_dtype: float32
54
+ use_amp: false
55
+ log_interval: 10
56
+ use_tensorboard: true
57
+ use_wandb: false
58
+ wandb_project: null
59
+ wandb_id: null
60
+ wandb_entity: null
61
+ wandb_name: null
62
+ wandb_model_log_interval: -1
63
+ detect_anomaly: false
64
+ pretrain_path: null
65
+ init_param:
66
+ - exp/tts_train_conformer_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth:tts:tts.generator.text2mel
67
+ - exp/ljspeech_hifigan.v1/generator.pth::tts.generator.vocoder
68
+ - exp/ljspeech_hifigan.v1/discriminator.pth::tts.discriminator
69
+ ignore_init_mismatch: false
70
+ freeze_param: []
71
+ num_iters_per_epoch: 500
72
+ batch_size: 20
73
+ valid_batch_size: null
74
+ batch_bins: 5000000
75
+ valid_batch_bins: null
76
+ train_shape_file:
77
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/text_shape.phn
78
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/speech_shape
79
+ valid_shape_file:
80
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/text_shape.phn
81
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/speech_shape
82
+ batch_type: numel
83
+ valid_batch_type: null
84
+ fold_length:
85
+ - 150
86
+ - 204800
87
+ sort_in_batch: descending
88
+ sort_batch: descending
89
+ multiple_iterator: false
90
+ chunk_length: 500
91
+ chunk_shift_ratio: 0.5
92
+ num_cache_chunks: 1024
93
+ train_data_path_and_name_and_type:
94
+ - - dump/raw/tr_no_dev/text
95
+ - text
96
+ - text
97
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/tr_no_dev/durations
98
+ - durations
99
+ - text_int
100
+ - - dump/raw/tr_no_dev/wav.scp
101
+ - speech
102
+ - sound
103
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/collect_feats/pitch.scp
104
+ - pitch
105
+ - npy
106
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/collect_feats/energy.scp
107
+ - energy
108
+ - npy
109
+ valid_data_path_and_name_and_type:
110
+ - - dump/raw/dev/text
111
+ - text
112
+ - text
113
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/dev/durations
114
+ - durations
115
+ - text_int
116
+ - - dump/raw/dev/wav.scp
117
+ - speech
118
+ - sound
119
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/collect_feats/pitch.scp
120
+ - pitch
121
+ - npy
122
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/collect_feats/energy.scp
123
+ - energy
124
+ - npy
125
+ allow_variable_data_keys: false
126
+ max_cache_size: 0.0
127
+ max_cache_fd: 32
128
+ valid_max_cache_size: null
129
+ optim: adam
130
+ optim_conf:
131
+ lr: 1.25e-05
132
+ betas:
133
+ - 0.5
134
+ - 0.9
135
+ weight_decay: 0.0
136
+ scheduler: exponentiallr
137
+ scheduler_conf:
138
+ gamma: 0.999875
139
+ optim2: adam
140
+ optim2_conf:
141
+ lr: 1.25e-05
142
+ betas:
143
+ - 0.5
144
+ - 0.9
145
+ weight_decay: 0.0
146
+ scheduler2: exponentiallr
147
+ scheduler2_conf:
148
+ gamma: 0.999875
149
+ generator_first: true
150
+ token_list:
151
+ - <blank>
152
+ - <unk>
153
+ - AH0
154
+ - N
155
+ - T
156
+ - D
157
+ - S
158
+ - R
159
+ - L
160
+ - DH
161
+ - K
162
+ - Z
163
+ - IH1
164
+ - IH0
165
+ - M
166
+ - EH1
167
+ - W
168
+ - P
169
+ - AE1
170
+ - AH1
171
+ - V
172
+ - ER0
173
+ - F
174
+ - ','
175
+ - AA1
176
+ - B
177
+ - HH
178
+ - IY1
179
+ - UW1
180
+ - IY0
181
+ - AO1
182
+ - EY1
183
+ - AY1
184
+ - .
185
+ - OW1
186
+ - SH
187
+ - NG
188
+ - G
189
+ - ER1
190
+ - CH
191
+ - JH
192
+ - Y
193
+ - AW1
194
+ - TH
195
+ - UH1
196
+ - EH2
197
+ - OW0
198
+ - EY2
199
+ - AO0
200
+ - IH2
201
+ - AE2
202
+ - AY2
203
+ - AA2
204
+ - UW0
205
+ - EH0
206
+ - OY1
207
+ - EY0
208
+ - AO2
209
+ - ZH
210
+ - OW2
211
+ - AE0
212
+ - UW2
213
+ - AH2
214
+ - AY0
215
+ - IY2
216
+ - AW2
217
+ - AA0
218
+ - ''''
219
+ - ER2
220
+ - UH2
221
+ - '?'
222
+ - OY2
223
+ - '!'
224
+ - AW0
225
+ - UH0
226
+ - OY0
227
+ - ..
228
+ - <sos/eos>
229
+ odim: null
230
+ model_conf: {}
231
+ use_preprocessor: true
232
+ token_type: phn
233
+ bpemodel: null
234
+ non_linguistic_symbols: null
235
+ cleaner: tacotron
236
+ g2p: g2p_en_no_space
237
+ feats_extract: fbank
238
+ feats_extract_conf:
239
+ n_fft: 1024
240
+ hop_length: 256
241
+ win_length: null
242
+ fs: 22050
243
+ fmin: 80
244
+ fmax: 7600
245
+ n_mels: 80
246
+ normalize: global_mvn
247
+ normalize_conf:
248
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz
249
+ tts: joint_text2wav
250
+ tts_conf:
251
+ text2mel_type: fastspeech2
252
+ text2mel_params:
253
+ adim: 384
254
+ aheads: 2
255
+ conformer_activation_type: swish
256
+ conformer_dec_kernel_size: 31
257
+ conformer_enc_kernel_size: 7
258
+ conformer_pos_enc_layer_type: rel_pos
259
+ conformer_self_attn_layer_type: rel_selfattn
260
+ decoder_normalize_before: false
261
+ decoder_type: conformer
262
+ dlayers: 4
263
+ dunits: 1536
264
+ duration_predictor_chans: 256
265
+ duration_predictor_kernel_size: 3
266
+ duration_predictor_layers: 2
267
+ elayers: 4
268
+ encoder_normalize_before: false
269
+ encoder_type: conformer
270
+ energy_embed_dropout: 0.0
271
+ energy_embed_kernel_size: 1
272
+ energy_predictor_chans: 256
273
+ energy_predictor_dropout: 0.5
274
+ energy_predictor_kernel_size: 3
275
+ energy_predictor_layers: 2
276
+ eunits: 1536
277
+ init_type: xavier_uniform
278
+ pitch_embed_dropout: 0.0
279
+ pitch_embed_kernel_size: 1
280
+ pitch_predictor_chans: 256
281
+ pitch_predictor_dropout: 0.5
282
+ pitch_predictor_kernel_size: 5
283
+ pitch_predictor_layers: 5
284
+ positionwise_conv_kernel_size: 3
285
+ positionwise_layer_type: conv1d
286
+ postnet_chans: 256
287
+ postnet_filts: 5
288
+ postnet_layers: 5
289
+ reduction_factor: 1
290
+ stop_gradient_from_energy_predictor: false
291
+ stop_gradient_from_pitch_predictor: true
292
+ transformer_dec_attn_dropout_rate: 0.2
293
+ transformer_dec_dropout_rate: 0.2
294
+ transformer_dec_positional_dropout_rate: 0.2
295
+ transformer_enc_attn_dropout_rate: 0.2
296
+ transformer_enc_dropout_rate: 0.2
297
+ transformer_enc_positional_dropout_rate: 0.2
298
+ use_cnn_in_conformer: true
299
+ use_macaron_style_in_conformer: true
300
+ use_masking: true
301
+ idim: 78
302
+ odim: 80
303
+ vocoder_type: hifigan_generator
304
+ vocoder_params:
305
+ bias: true
306
+ channels: 512
307
+ in_channels: 80
308
+ kernel_size: 7
309
+ nonlinear_activation: LeakyReLU
310
+ nonlinear_activation_params:
311
+ negative_slope: 0.1
312
+ out_channels: 1
313
+ resblock_dilations:
314
+ - - 1
315
+ - 3
316
+ - 5
317
+ - - 1
318
+ - 3
319
+ - 5
320
+ - - 1
321
+ - 3
322
+ - 5
323
+ resblock_kernel_sizes:
324
+ - 3
325
+ - 7
326
+ - 11
327
+ upsample_kernel_sizes:
328
+ - 16
329
+ - 16
330
+ - 4
331
+ - 4
332
+ upsample_scales:
333
+ - 8
334
+ - 8
335
+ - 2
336
+ - 2
337
+ use_additional_convs: true
338
+ use_weight_norm: true
339
+ discriminator_type: hifigan_multi_scale_multi_period_discriminator
340
+ discriminator_params:
341
+ follow_official_norm: true
342
+ period_discriminator_params:
343
+ bias: true
344
+ channels: 32
345
+ downsample_scales:
346
+ - 3
347
+ - 3
348
+ - 3
349
+ - 3
350
+ - 1
351
+ in_channels: 1
352
+ kernel_sizes:
353
+ - 5
354
+ - 3
355
+ max_downsample_channels: 1024
356
+ nonlinear_activation: LeakyReLU
357
+ nonlinear_activation_params:
358
+ negative_slope: 0.1
359
+ out_channels: 1
360
+ use_spectral_norm: false
361
+ use_weight_norm: true
362
+ periods:
363
+ - 2
364
+ - 3
365
+ - 5
366
+ - 7
367
+ - 11
368
+ scale_discriminator_params:
369
+ bias: true
370
+ channels: 128
371
+ downsample_scales:
372
+ - 4
373
+ - 4
374
+ - 4
375
+ - 4
376
+ - 1
377
+ in_channels: 1
378
+ kernel_sizes:
379
+ - 15
380
+ - 41
381
+ - 5
382
+ - 3
383
+ max_downsample_channels: 1024
384
+ max_groups: 16
385
+ nonlinear_activation: LeakyReLU
386
+ nonlinear_activation_params:
387
+ negative_slope: 0.1
388
+ out_channels: 1
389
+ scale_downsample_pooling: AvgPool1d
390
+ scale_downsample_pooling_params:
391
+ kernel_size: 4
392
+ padding: 2
393
+ stride: 2
394
+ scales: 3
395
+ generator_adv_loss_params:
396
+ average_by_discriminators: false
397
+ loss_type: mse
398
+ discriminator_adv_loss_params:
399
+ average_by_discriminators: false
400
+ loss_type: mse
401
+ use_feat_match_loss: true
402
+ feat_match_loss_params:
403
+ average_by_discriminators: false
404
+ average_by_layers: false
405
+ include_final_outputs: true
406
+ use_mel_loss: true
407
+ mel_loss_params:
408
+ fs: 22050
409
+ n_fft: 1024
410
+ hop_length: 256
411
+ win_length: null
412
+ window: hann
413
+ n_mels: 80
414
+ fmin: 0
415
+ fmax: null
416
+ log_base: null
417
+ lambda_text2mel: 1.0
418
+ lambda_adv: 1.0
419
+ lambda_mel: 45.0
420
+ lambda_feat_match: 2.0
421
+ sampling_rate: 22050
422
+ segment_size: 32
423
+ cache_generator_outputs: true
424
+ pitch_extract: dio
425
+ pitch_extract_conf:
426
+ reduction_factor: 1
427
+ fs: 22050
428
+ n_fft: 1024
429
+ hop_length: 256
430
+ f0max: 400
431
+ f0min: 80
432
+ pitch_normalize: global_mvn
433
+ pitch_normalize_conf:
434
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz
435
+ energy_extract: energy
436
+ energy_extract_conf:
437
+ reduction_factor: 1
438
+ fs: 22050
439
+ n_fft: 1024
440
+ hop_length: 256
441
+ win_length: null
442
+ energy_normalize: global_mvn
443
+ energy_normalize_conf:
444
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz
445
+ required:
446
+ - output_dir
447
+ - token_list
448
+ version: 0.10.3a1
449
+ distributed: true
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/adv_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/duration_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/energy_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/fake_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/feat_match_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/l1_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/mel_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/pitch_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/real_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/text2mel_loss.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/images/train_time.png ADDED
exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a18381c8d0f64ad79436883cf8d027eeb7974b7f98cec69475720855381b38bc
3
+ size 620334354
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz ADDED
Binary file (770 Bytes). View file
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz ADDED
Binary file (770 Bytes). View file
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ model_file: exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_5best.pth
4
+ python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
5
+ timestamp: 1631237823.97896
6
+ torch: 1.7.1
7
+ yaml_files:
8
+ train_config: exp/tts_finetune_joint_conformer_fastspeech2_hifigan.v8_raw_phn_tacotron_g2p_en_no_space/config.yaml