Siddhant commited on
Commit
a9dc64c
1 Parent(s): 6bfeb33

import from zenodo

Browse files
Files changed (32) hide show
  1. README.md +50 -0
  2. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/config.yaml +446 -0
  3. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/adv_loss.png +0 -0
  4. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png +0 -0
  5. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png +0 -0
  6. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png +0 -0
  7. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png +0 -0
  8. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png +0 -0
  9. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/duration_loss.png +0 -0
  10. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/energy_loss.png +0 -0
  11. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/fake_loss.png +0 -0
  12. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/feat_match_loss.png +0 -0
  13. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png +0 -0
  14. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png +0 -0
  15. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png +0 -0
  16. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png +0 -0
  17. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png +0 -0
  18. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png +0 -0
  19. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/l1_loss.png +0 -0
  20. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/loss.png +0 -0
  21. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/mel_loss.png +0 -0
  22. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png +0 -0
  23. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png +0 -0
  24. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/pitch_loss.png +0 -0
  25. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/real_loss.png +0 -0
  26. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/text2mel_loss.png +0 -0
  27. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/train_time.png +0 -0
  28. exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth +3 -0
  29. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz +0 -0
  30. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz +0 -0
  31. exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz +0 -0
  32. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - text-to-speech
6
+ language: en
7
+ datasets:
8
+ - ljspeech
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 TTS pretrained model
12
+ ### `kan-bayashi/ljspeech_tts_train_joint_conformer_fastspeech2_hifigan_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave`
13
+ ♻️ Imported from https://zenodo.org/record/5498487/
14
+
15
+ This model was trained by kan-bayashi using ljspeech/tts1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+ @inproceedings{hayashi2020espnet,
32
+ title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
33
+ author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
34
+ booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
35
+ pages={7654--7658},
36
+ year={2020},
37
+ organization={IEEE}
38
+ }
39
+ ```
40
+ or arXiv:
41
+ ```bibtex
42
+ @misc{watanabe2018espnet,
43
+ title={ESPnet: End-to-End Speech Processing Toolkit},
44
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
45
+ year={2018},
46
+ eprint={1804.00015},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/config.yaml ADDED
@@ -0,0 +1,446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: ./conf/tuning/train_joint_conformer_fastspeech2_hifigan.v2.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space
7
+ ngpu: 1
8
+ seed: 777
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 57589
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 2000
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - total_count
39
+ - max
40
+ keep_nbest_models: 10
41
+ grad_clip: -1
42
+ grad_clip_type: 2.0
43
+ grad_noise: false
44
+ accum_grad: 1
45
+ no_forward_run: false
46
+ resume: true
47
+ train_dtype: float32
48
+ use_amp: false
49
+ log_interval: 50
50
+ use_tensorboard: true
51
+ use_wandb: false
52
+ wandb_project: null
53
+ wandb_id: null
54
+ wandb_entity: null
55
+ wandb_name: null
56
+ wandb_model_log_interval: -1
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param: []
60
+ ignore_init_mismatch: false
61
+ freeze_param: []
62
+ num_iters_per_epoch: 500
63
+ batch_size: 20
64
+ valid_batch_size: null
65
+ batch_bins: 5000000
66
+ valid_batch_bins: null
67
+ train_shape_file:
68
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/text_shape.phn
69
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/speech_shape
70
+ valid_shape_file:
71
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/text_shape.phn
72
+ - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/speech_shape
73
+ batch_type: numel
74
+ valid_batch_type: null
75
+ fold_length:
76
+ - 150
77
+ - 204800
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - dump/raw/tr_no_dev/text
86
+ - text
87
+ - text
88
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/tr_no_dev/durations
89
+ - durations
90
+ - text_int
91
+ - - dump/raw/tr_no_dev/wav.scp
92
+ - speech
93
+ - sound
94
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/collect_feats/pitch.scp
95
+ - pitch
96
+ - npy
97
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/collect_feats/energy.scp
98
+ - energy
99
+ - npy
100
+ valid_data_path_and_name_and_type:
101
+ - - dump/raw/dev/text
102
+ - text
103
+ - text
104
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/dev/durations
105
+ - durations
106
+ - text_int
107
+ - - dump/raw/dev/wav.scp
108
+ - speech
109
+ - sound
110
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/collect_feats/pitch.scp
111
+ - pitch
112
+ - npy
113
+ - - exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/valid/collect_feats/energy.scp
114
+ - energy
115
+ - npy
116
+ allow_variable_data_keys: false
117
+ max_cache_size: 0.0
118
+ max_cache_fd: 32
119
+ valid_max_cache_size: null
120
+ optim: adamw
121
+ optim_conf:
122
+ lr: 0.0002
123
+ betas:
124
+ - 0.8
125
+ - 0.99
126
+ eps: 1.0e-09
127
+ weight_decay: 0.0
128
+ scheduler: exponentiallr
129
+ scheduler_conf:
130
+ gamma: 0.999875
131
+ optim2: adamw
132
+ optim2_conf:
133
+ lr: 0.0002
134
+ betas:
135
+ - 0.8
136
+ - 0.99
137
+ eps: 1.0e-09
138
+ weight_decay: 0.0
139
+ scheduler2: exponentiallr
140
+ scheduler2_conf:
141
+ gamma: 0.999875
142
+ generator_first: true
143
+ token_list:
144
+ - <blank>
145
+ - <unk>
146
+ - AH0
147
+ - N
148
+ - T
149
+ - D
150
+ - S
151
+ - R
152
+ - L
153
+ - DH
154
+ - K
155
+ - Z
156
+ - IH1
157
+ - IH0
158
+ - M
159
+ - EH1
160
+ - W
161
+ - P
162
+ - AE1
163
+ - AH1
164
+ - V
165
+ - ER0
166
+ - F
167
+ - ','
168
+ - AA1
169
+ - B
170
+ - HH
171
+ - IY1
172
+ - UW1
173
+ - IY0
174
+ - AO1
175
+ - EY1
176
+ - AY1
177
+ - .
178
+ - OW1
179
+ - SH
180
+ - NG
181
+ - G
182
+ - ER1
183
+ - CH
184
+ - JH
185
+ - Y
186
+ - AW1
187
+ - TH
188
+ - UH1
189
+ - EH2
190
+ - OW0
191
+ - EY2
192
+ - AO0
193
+ - IH2
194
+ - AE2
195
+ - AY2
196
+ - AA2
197
+ - UW0
198
+ - EH0
199
+ - OY1
200
+ - EY0
201
+ - AO2
202
+ - ZH
203
+ - OW2
204
+ - AE0
205
+ - UW2
206
+ - AH2
207
+ - AY0
208
+ - IY2
209
+ - AW2
210
+ - AA0
211
+ - ''''
212
+ - ER2
213
+ - UH2
214
+ - '?'
215
+ - OY2
216
+ - '!'
217
+ - AW0
218
+ - UH0
219
+ - OY0
220
+ - ..
221
+ - <sos/eos>
222
+ odim: null
223
+ model_conf: {}
224
+ use_preprocessor: true
225
+ token_type: phn
226
+ bpemodel: null
227
+ non_linguistic_symbols: null
228
+ cleaner: tacotron
229
+ g2p: g2p_en_no_space
230
+ feats_extract: fbank
231
+ feats_extract_conf:
232
+ n_fft: 1024
233
+ hop_length: 256
234
+ win_length: null
235
+ fs: 22050
236
+ fmin: 80
237
+ fmax: 7600
238
+ n_mels: 80
239
+ normalize: global_mvn
240
+ normalize_conf:
241
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz
242
+ tts: joint_text2wav
243
+ tts_conf:
244
+ text2mel_type: fastspeech2
245
+ text2mel_params:
246
+ adim: 384
247
+ aheads: 2
248
+ elayers: 4
249
+ eunits: 1536
250
+ dlayers: 4
251
+ dunits: 1536
252
+ positionwise_layer_type: conv1d
253
+ positionwise_conv_kernel_size: 3
254
+ duration_predictor_layers: 2
255
+ duration_predictor_chans: 256
256
+ duration_predictor_kernel_size: 3
257
+ postnet_layers: 5
258
+ postnet_filts: 5
259
+ postnet_chans: 256
260
+ use_masking: true
261
+ encoder_normalize_before: true
262
+ decoder_normalize_before: true
263
+ reduction_factor: 1
264
+ encoder_type: conformer
265
+ decoder_type: conformer
266
+ conformer_rel_pos_type: latest
267
+ conformer_pos_enc_layer_type: rel_pos
268
+ conformer_self_attn_layer_type: rel_selfattn
269
+ conformer_activation_type: swish
270
+ use_macaron_style_in_conformer: true
271
+ use_cnn_in_conformer: true
272
+ conformer_enc_kernel_size: 7
273
+ conformer_dec_kernel_size: 31
274
+ init_type: xavier_uniform
275
+ transformer_enc_dropout_rate: 0.2
276
+ transformer_enc_positional_dropout_rate: 0.2
277
+ transformer_enc_attn_dropout_rate: 0.2
278
+ transformer_dec_dropout_rate: 0.2
279
+ transformer_dec_positional_dropout_rate: 0.2
280
+ transformer_dec_attn_dropout_rate: 0.2
281
+ pitch_predictor_layers: 5
282
+ pitch_predictor_chans: 256
283
+ pitch_predictor_kernel_size: 5
284
+ pitch_predictor_dropout: 0.5
285
+ pitch_embed_kernel_size: 1
286
+ pitch_embed_dropout: 0.0
287
+ stop_gradient_from_pitch_predictor: true
288
+ energy_predictor_layers: 2
289
+ energy_predictor_chans: 256
290
+ energy_predictor_kernel_size: 3
291
+ energy_predictor_dropout: 0.5
292
+ energy_embed_kernel_size: 1
293
+ energy_embed_dropout: 0.0
294
+ stop_gradient_from_energy_predictor: false
295
+ idim: 78
296
+ odim: 80
297
+ vocoder_type: hifigan_generator
298
+ vocoder_params:
299
+ out_channels: 1
300
+ channels: 512
301
+ global_channels: -1
302
+ kernel_size: 7
303
+ upsample_scales:
304
+ - 8
305
+ - 8
306
+ - 2
307
+ - 2
308
+ upsample_kernel_sizes:
309
+ - 16
310
+ - 16
311
+ - 4
312
+ - 4
313
+ resblock_kernel_sizes:
314
+ - 3
315
+ - 7
316
+ - 11
317
+ resblock_dilations:
318
+ - - 1
319
+ - 3
320
+ - 5
321
+ - - 1
322
+ - 3
323
+ - 5
324
+ - - 1
325
+ - 3
326
+ - 5
327
+ use_additional_convs: true
328
+ bias: true
329
+ nonlinear_activation: LeakyReLU
330
+ nonlinear_activation_params:
331
+ negative_slope: 0.1
332
+ use_weight_norm: true
333
+ in_channels: 80
334
+ discriminator_type: hifigan_multi_scale_multi_period_discriminator
335
+ discriminator_params:
336
+ scales: 1
337
+ scale_downsample_pooling: AvgPool1d
338
+ scale_downsample_pooling_params:
339
+ kernel_size: 4
340
+ stride: 2
341
+ padding: 2
342
+ scale_discriminator_params:
343
+ in_channels: 1
344
+ out_channels: 1
345
+ kernel_sizes:
346
+ - 15
347
+ - 41
348
+ - 5
349
+ - 3
350
+ channels: 128
351
+ max_downsample_channels: 1024
352
+ max_groups: 16
353
+ bias: true
354
+ downsample_scales:
355
+ - 2
356
+ - 2
357
+ - 4
358
+ - 4
359
+ - 1
360
+ nonlinear_activation: LeakyReLU
361
+ nonlinear_activation_params:
362
+ negative_slope: 0.1
363
+ use_weight_norm: true
364
+ use_spectral_norm: false
365
+ follow_official_norm: false
366
+ periods:
367
+ - 2
368
+ - 3
369
+ - 5
370
+ - 7
371
+ - 11
372
+ period_discriminator_params:
373
+ in_channels: 1
374
+ out_channels: 1
375
+ kernel_sizes:
376
+ - 5
377
+ - 3
378
+ channels: 32
379
+ downsample_scales:
380
+ - 3
381
+ - 3
382
+ - 3
383
+ - 3
384
+ - 1
385
+ max_downsample_channels: 1024
386
+ bias: true
387
+ nonlinear_activation: LeakyReLU
388
+ nonlinear_activation_params:
389
+ negative_slope: 0.1
390
+ use_weight_norm: true
391
+ use_spectral_norm: false
392
+ generator_adv_loss_params:
393
+ average_by_discriminators: false
394
+ loss_type: mse
395
+ discriminator_adv_loss_params:
396
+ average_by_discriminators: false
397
+ loss_type: mse
398
+ use_feat_match_loss: true
399
+ feat_match_loss_params:
400
+ average_by_discriminators: false
401
+ average_by_layers: false
402
+ include_final_outputs: true
403
+ use_mel_loss: true
404
+ mel_loss_params:
405
+ fs: 22050
406
+ n_fft: 1024
407
+ hop_length: 256
408
+ win_length: null
409
+ window: hann
410
+ n_mels: 80
411
+ fmin: 0
412
+ fmax: null
413
+ log_base: null
414
+ lambda_text2mel: 1.0
415
+ lambda_adv: 1.0
416
+ lambda_mel: 45.0
417
+ lambda_feat_match: 2.0
418
+ sampling_rate: 22050
419
+ segment_size: 32
420
+ cache_generator_outputs: true
421
+ pitch_extract: dio
422
+ pitch_extract_conf:
423
+ reduction_factor: 1
424
+ fs: 22050
425
+ n_fft: 1024
426
+ hop_length: 256
427
+ f0max: 400
428
+ f0min: 80
429
+ pitch_normalize: global_mvn
430
+ pitch_normalize_conf:
431
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz
432
+ energy_extract: energy
433
+ energy_extract_conf:
434
+ reduction_factor: 1
435
+ fs: 22050
436
+ n_fft: 1024
437
+ hop_length: 256
438
+ win_length: null
439
+ energy_normalize: global_mvn
440
+ energy_normalize_conf:
441
+ stats_file: exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz
442
+ required:
443
+ - output_dir
444
+ - token_list
445
+ version: 0.10.3a1
446
+ distributed: true
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/adv_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/duration_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/energy_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/fake_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/feat_match_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/l1_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/mel_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/pitch_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/real_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/text2mel_loss.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/images/train_time.png ADDED
exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00cb215dbca7128e2475aa65eb1decb3999be344759e1812555fefdf2080bc49
3
+ size 541365462
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/energy_stats.npz ADDED
Binary file (770 Bytes). View file
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
exp/tts_train_tacotron2_raw_phn_tacotron_g2p_en_no_space/decode_use_teacher_forcingtrue_train.loss.best/stats/train/pitch_stats.npz ADDED
Binary file (770 Bytes). View file
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ model_file: exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth
4
+ python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
5
+ timestamp: 1631233021.62745
6
+ torch: 1.7.1
7
+ yaml_files:
8
+ train_config: exp/tts_train_joint_conformer_fastspeech2_hifigan.v2_raw_phn_tacotron_g2p_en_no_space/config.yaml