Siddhant commited on
Commit
2cd972f
1 Parent(s): f8f93ab

import from zenodo

Browse files
Files changed (27) hide show
  1. README.md +50 -0
  2. dump/44.1k/raw/org/tr_no_dev/spk2sid +109 -0
  3. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml +392 -0
  4. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png +0 -0
  5. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_fake_loss.png +0 -0
  6. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png +0 -0
  7. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png +0 -0
  8. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png +0 -0
  9. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_real_loss.png +0 -0
  10. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png +0 -0
  11. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_adv_loss.png +0 -0
  12. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png +0 -0
  13. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_dur_loss.png +0 -0
  14. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_feat_match_loss.png +0 -0
  15. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png +0 -0
  16. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_kl_loss.png +0 -0
  17. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_loss.png +0 -0
  18. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_mel_loss.png +0 -0
  19. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png +0 -0
  20. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png +0 -0
  21. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png +0 -0
  22. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png +0 -0
  23. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png +0 -0
  24. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png +0 -0
  25. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/train_time.png +0 -0
  26. exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth +3 -0
  27. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - text-to-speech
6
+ language: en
7
+ datasets:
8
+ - vctk
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 TTS pretrained model
12
+ ### `kan-bayashi/vctk_full_band_multi_spk_vits`
13
+ ♻️ Imported from https://zenodo.org/record/5521431/
14
+
15
+ This model was trained by kan-bayashi using vctk/tts1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+ @inproceedings{hayashi2020espnet,
32
+ title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
33
+ author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
34
+ booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
35
+ pages={7654--7658},
36
+ year={2020},
37
+ organization={IEEE}
38
+ }
39
+ ```
40
+ or arXiv:
41
+ ```bibtex
42
+ @misc{watanabe2018espnet,
43
+ title={ESPnet: End-to-End Speech Processing Toolkit},
44
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
45
+ year={2018},
46
+ eprint={1804.00015},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
dump/44.1k/raw/org/tr_no_dev/spk2sid ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <unk> 0
2
+ p225 1
3
+ p226 2
4
+ p227 3
5
+ p228 4
6
+ p229 5
7
+ p230 6
8
+ p231 7
9
+ p232 8
10
+ p233 9
11
+ p234 10
12
+ p236 11
13
+ p237 12
14
+ p238 13
15
+ p239 14
16
+ p240 15
17
+ p241 16
18
+ p243 17
19
+ p244 18
20
+ p245 19
21
+ p246 20
22
+ p247 21
23
+ p248 22
24
+ p249 23
25
+ p250 24
26
+ p251 25
27
+ p252 26
28
+ p253 27
29
+ p254 28
30
+ p255 29
31
+ p256 30
32
+ p257 31
33
+ p258 32
34
+ p259 33
35
+ p260 34
36
+ p261 35
37
+ p262 36
38
+ p263 37
39
+ p264 38
40
+ p265 39
41
+ p266 40
42
+ p267 41
43
+ p268 42
44
+ p269 43
45
+ p270 44
46
+ p271 45
47
+ p272 46
48
+ p273 47
49
+ p274 48
50
+ p275 49
51
+ p276 50
52
+ p277 51
53
+ p278 52
54
+ p279 53
55
+ p280 54
56
+ p281 55
57
+ p282 56
58
+ p283 57
59
+ p284 58
60
+ p285 59
61
+ p286 60
62
+ p287 61
63
+ p288 62
64
+ p292 63
65
+ p293 64
66
+ p294 65
67
+ p295 66
68
+ p297 67
69
+ p298 68
70
+ p299 69
71
+ p300 70
72
+ p301 71
73
+ p302 72
74
+ p303 73
75
+ p304 74
76
+ p305 75
77
+ p306 76
78
+ p307 77
79
+ p308 78
80
+ p310 79
81
+ p311 80
82
+ p312 81
83
+ p313 82
84
+ p314 83
85
+ p316 84
86
+ p317 85
87
+ p318 86
88
+ p323 87
89
+ p326 88
90
+ p329 89
91
+ p330 90
92
+ p333 91
93
+ p334 92
94
+ p335 93
95
+ p336 94
96
+ p339 95
97
+ p340 96
98
+ p341 97
99
+ p343 98
100
+ p345 99
101
+ p347 100
102
+ p351 101
103
+ p360 102
104
+ p361 103
105
+ p362 104
106
+ p363 105
107
+ p364 106
108
+ p374 107
109
+ p376 108
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: ./conf/tuning/train_full_band_multi_spk_vits.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space
7
+ ngpu: 1
8
+ seed: 777
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 37577
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 2000
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - total_count
39
+ - max
40
+ keep_nbest_models: 10
41
+ grad_clip: -1
42
+ grad_clip_type: 2.0
43
+ grad_noise: false
44
+ accum_grad: 1
45
+ no_forward_run: false
46
+ resume: true
47
+ train_dtype: float32
48
+ use_amp: false
49
+ log_interval: 50
50
+ use_tensorboard: true
51
+ use_wandb: false
52
+ wandb_project: null
53
+ wandb_id: null
54
+ wandb_entity: null
55
+ wandb_name: null
56
+ wandb_model_log_interval: -1
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param: []
60
+ ignore_init_mismatch: false
61
+ freeze_param: []
62
+ num_iters_per_epoch: 500
63
+ batch_size: 20
64
+ valid_batch_size: null
65
+ batch_bins: 4000000
66
+ valid_batch_bins: null
67
+ train_shape_file:
68
+ - exp/tts_stats_44.1k_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/text_shape.phn
69
+ - exp/tts_stats_44.1k_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/train/speech_shape
70
+ valid_shape_file:
71
+ - exp/tts_stats_44.1k_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/text_shape.phn
72
+ - exp/tts_stats_44.1k_raw_linear_spectrogram_phn_tacotron_g2p_en_no_space/valid/speech_shape
73
+ batch_type: numel
74
+ valid_batch_type: null
75
+ fold_length:
76
+ - 150
77
+ - 409600
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - dump/44.1k/raw/tr_no_dev/text
86
+ - text
87
+ - text
88
+ - - dump/44.1k/raw/tr_no_dev/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/44.1k/raw/tr_no_dev/utt2sid
92
+ - sids
93
+ - text_int
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/44.1k/raw/dev/text
96
+ - text
97
+ - text
98
+ - - dump/44.1k/raw/dev/wav.scp
99
+ - speech
100
+ - sound
101
+ - - dump/44.1k/raw/dev/utt2sid
102
+ - sids
103
+ - text_int
104
+ allow_variable_data_keys: false
105
+ max_cache_size: 0.0
106
+ max_cache_fd: 32
107
+ valid_max_cache_size: null
108
+ optim: adamw
109
+ optim_conf:
110
+ lr: 0.0002
111
+ betas:
112
+ - 0.8
113
+ - 0.99
114
+ eps: 1.0e-09
115
+ weight_decay: 0.0
116
+ scheduler: exponentiallr
117
+ scheduler_conf:
118
+ gamma: 0.999875
119
+ optim2: adamw
120
+ optim2_conf:
121
+ lr: 0.0002
122
+ betas:
123
+ - 0.8
124
+ - 0.99
125
+ eps: 1.0e-09
126
+ weight_decay: 0.0
127
+ scheduler2: exponentiallr
128
+ scheduler2_conf:
129
+ gamma: 0.999875
130
+ generator_first: false
131
+ token_list:
132
+ - <blank>
133
+ - <unk>
134
+ - AH0
135
+ - T
136
+ - N
137
+ - S
138
+ - R
139
+ - IH1
140
+ - D
141
+ - L
142
+ - .
143
+ - Z
144
+ - DH
145
+ - K
146
+ - W
147
+ - M
148
+ - AE1
149
+ - EH1
150
+ - AA1
151
+ - IH0
152
+ - IY1
153
+ - AH1
154
+ - B
155
+ - P
156
+ - V
157
+ - ER0
158
+ - F
159
+ - HH
160
+ - AY1
161
+ - EY1
162
+ - UW1
163
+ - IY0
164
+ - AO1
165
+ - OW1
166
+ - G
167
+ - ','
168
+ - NG
169
+ - SH
170
+ - Y
171
+ - JH
172
+ - AW1
173
+ - UH1
174
+ - TH
175
+ - ER1
176
+ - CH
177
+ - '?'
178
+ - OW0
179
+ - OW2
180
+ - EH2
181
+ - EY2
182
+ - UW0
183
+ - IH2
184
+ - OY1
185
+ - AY2
186
+ - ZH
187
+ - AW2
188
+ - EH0
189
+ - IY2
190
+ - AA2
191
+ - AE0
192
+ - AH2
193
+ - AE2
194
+ - AO0
195
+ - AO2
196
+ - AY0
197
+ - UW2
198
+ - UH2
199
+ - AA0
200
+ - AW0
201
+ - EY0
202
+ - '!'
203
+ - UH0
204
+ - ER2
205
+ - OY2
206
+ - ''''
207
+ - OY0
208
+ - <sos/eos>
209
+ odim: null
210
+ model_conf: {}
211
+ use_preprocessor: true
212
+ token_type: phn
213
+ bpemodel: null
214
+ non_linguistic_symbols: null
215
+ cleaner: tacotron
216
+ g2p: g2p_en_no_space
217
+ feats_extract: linear_spectrogram
218
+ feats_extract_conf:
219
+ n_fft: 2048
220
+ hop_length: 512
221
+ win_length: null
222
+ normalize: null
223
+ normalize_conf: {}
224
+ tts: vits
225
+ tts_conf:
226
+ generator_type: vits_generator
227
+ generator_params:
228
+ hidden_channels: 192
229
+ spks: 128
230
+ global_channels: 256
231
+ segment_size: 32
232
+ text_encoder_attention_heads: 2
233
+ text_encoder_ffn_expand: 4
234
+ text_encoder_blocks: 6
235
+ text_encoder_positionwise_layer_type: conv1d
236
+ text_encoder_positionwise_conv_kernel_size: 3
237
+ text_encoder_positional_encoding_layer_type: rel_pos
238
+ text_encoder_self_attention_layer_type: rel_selfattn
239
+ text_encoder_activation_type: swish
240
+ text_encoder_normalize_before: true
241
+ text_encoder_dropout_rate: 0.1
242
+ text_encoder_positional_dropout_rate: 0.0
243
+ text_encoder_attention_dropout_rate: 0.1
244
+ use_macaron_style_in_text_encoder: true
245
+ use_conformer_conv_in_text_encoder: false
246
+ text_encoder_conformer_kernel_size: -1
247
+ decoder_kernel_size: 7
248
+ decoder_channels: 512
249
+ decoder_upsample_scales:
250
+ - 8
251
+ - 8
252
+ - 2
253
+ - 2
254
+ - 2
255
+ decoder_upsample_kernel_sizes:
256
+ - 16
257
+ - 16
258
+ - 4
259
+ - 4
260
+ - 4
261
+ decoder_resblock_kernel_sizes:
262
+ - 3
263
+ - 7
264
+ - 11
265
+ decoder_resblock_dilations:
266
+ - - 1
267
+ - 3
268
+ - 5
269
+ - - 1
270
+ - 3
271
+ - 5
272
+ - - 1
273
+ - 3
274
+ - 5
275
+ use_weight_norm_in_decoder: true
276
+ posterior_encoder_kernel_size: 5
277
+ posterior_encoder_layers: 16
278
+ posterior_encoder_stacks: 1
279
+ posterior_encoder_base_dilation: 1
280
+ posterior_encoder_dropout_rate: 0.0
281
+ use_weight_norm_in_posterior_encoder: true
282
+ flow_flows: 4
283
+ flow_kernel_size: 5
284
+ flow_base_dilation: 1
285
+ flow_layers: 4
286
+ flow_dropout_rate: 0.0
287
+ use_weight_norm_in_flow: true
288
+ use_only_mean_in_flow: true
289
+ stochastic_duration_predictor_kernel_size: 3
290
+ stochastic_duration_predictor_dropout_rate: 0.5
291
+ stochastic_duration_predictor_flows: 4
292
+ stochastic_duration_predictor_dds_conv_layers: 3
293
+ vocabs: 77
294
+ aux_channels: 1025
295
+ discriminator_type: hifigan_multi_scale_multi_period_discriminator
296
+ discriminator_params:
297
+ scales: 1
298
+ scale_downsample_pooling: AvgPool1d
299
+ scale_downsample_pooling_params:
300
+ kernel_size: 4
301
+ stride: 2
302
+ padding: 2
303
+ scale_discriminator_params:
304
+ in_channels: 1
305
+ out_channels: 1
306
+ kernel_sizes:
307
+ - 15
308
+ - 41
309
+ - 5
310
+ - 3
311
+ channels: 128
312
+ max_downsample_channels: 1024
313
+ max_groups: 16
314
+ bias: true
315
+ downsample_scales:
316
+ - 2
317
+ - 2
318
+ - 4
319
+ - 4
320
+ - 1
321
+ nonlinear_activation: LeakyReLU
322
+ nonlinear_activation_params:
323
+ negative_slope: 0.1
324
+ use_weight_norm: true
325
+ use_spectral_norm: false
326
+ follow_official_norm: false
327
+ periods:
328
+ - 2
329
+ - 3
330
+ - 5
331
+ - 7
332
+ - 11
333
+ period_discriminator_params:
334
+ in_channels: 1
335
+ out_channels: 1
336
+ kernel_sizes:
337
+ - 5
338
+ - 3
339
+ channels: 32
340
+ downsample_scales:
341
+ - 3
342
+ - 3
343
+ - 3
344
+ - 3
345
+ - 1
346
+ max_downsample_channels: 1024
347
+ bias: true
348
+ nonlinear_activation: LeakyReLU
349
+ nonlinear_activation_params:
350
+ negative_slope: 0.1
351
+ use_weight_norm: true
352
+ use_spectral_norm: false
353
+ generator_adv_loss_params:
354
+ average_by_discriminators: false
355
+ loss_type: mse
356
+ discriminator_adv_loss_params:
357
+ average_by_discriminators: false
358
+ loss_type: mse
359
+ feat_match_loss_params:
360
+ average_by_discriminators: false
361
+ average_by_layers: false
362
+ include_final_outputs: true
363
+ mel_loss_params:
364
+ fs: 44100
365
+ n_fft: 2048
366
+ hop_length: 512
367
+ win_length: null
368
+ window: hann
369
+ n_mels: 80
370
+ fmin: 0
371
+ fmax: null
372
+ log_base: null
373
+ lambda_adv: 1.0
374
+ lambda_mel: 45.0
375
+ lambda_feat_match: 2.0
376
+ lambda_dur: 1.0
377
+ lambda_kl: 1.0
378
+ sampling_rate: 44100
379
+ cache_generator_outputs: true
380
+ pitch_extract: null
381
+ pitch_extract_conf: {}
382
+ pitch_normalize: null
383
+ pitch_normalize_conf: {}
384
+ energy_extract: null
385
+ energy_extract_conf: {}
386
+ energy_normalize: null
387
+ energy_normalize_conf: {}
388
+ required:
389
+ - output_dir
390
+ - token_list
391
+ version: 0.10.3a2
392
+ distributed: true
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_backward_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_fake_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_forward_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_optim_step_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_real_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/discriminator_train_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_adv_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_backward_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_dur_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_feat_match_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_forward_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_kl_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_mel_loss.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_optim_step_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/generator_train_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/gpu_max_cached_mem_GB.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/iter_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/optim0_lr0.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/optim1_lr0.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/images/train_time.png ADDED
exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50c569818a8cea5b4328b8cd2d3bfbea189110889cd3e4c17e91aea9aac1ee24
3
+ size 386796934
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ model_file: exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/train.total_count.ave_10best.pth
4
+ python: "3.7.3 (default, Mar 27 2019, 22:11:17) \n[GCC 7.3.0]"
5
+ timestamp: 1632318552.576057
6
+ torch: 1.7.1
7
+ yaml_files:
8
+ train_config: exp/tts_train_full_band_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space/config.yaml