Yoshiki commited on
Commit
33e2da5
1 Parent(s): 0ac6389

Update model

Browse files
Files changed (40) hide show
  1. README.md +413 -0
  2. data/nlsyms.txt +3 -0
  3. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/config.yaml +193 -0
  4. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/backward_time.png +0 -0
  5. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/clip.png +0 -0
  6. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/forward_time.png +0 -0
  7. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/gpu_max_cached_mem_GB.png +0 -0
  8. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/grad_norm.png +0 -0
  9. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/iter_time.png +0 -0
  10. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/loss.png +0 -0
  11. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/loss_scale.png +0 -0
  12. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/optim0_lr0.png +0 -0
  13. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/optim_step_time.png +0 -0
  14. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/train_time.png +0 -0
  15. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/perplexity_test/ppl +1 -0
  16. espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth +3 -0
  17. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/8epoch.pth +3 -0
  18. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/RESULTS.md +29 -0
  19. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/RESULTS_enh.md +20 -0
  20. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/config.yaml +315 -0
  21. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/acc.png +0 -0
  22. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/backward_time.png +0 -0
  23. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/cer.png +0 -0
  24. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/cer_ctc.png +0 -0
  25. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/clip.png +0 -0
  26. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/forward_time.png +0 -0
  27. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/gpu_max_cached_mem_GB.png +0 -0
  28. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/grad_norm.png +0 -0
  29. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/iter_time.png +0 -0
  30. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss.png +0 -0
  31. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_asr.png +0 -0
  32. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_att.png +0 -0
  33. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_ctc.png +0 -0
  34. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_enh.png +0 -0
  35. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_scale.png +0 -0
  36. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/optim0_lr0.png +0 -0
  37. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/optim_step_time.png +0 -0
  38. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/train_time.png +0 -0
  39. exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/wer.png +0 -0
  40. meta.yaml +10 -0
README.md CHANGED
@@ -1,3 +1,416 @@
1
  ---
 
 
 
 
 
 
 
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - speech-enhancement-recognition
6
+ language: en
7
+ datasets:
8
+ - wsj0_2mix_spatialized
9
  license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 EnhS2T model
13
+
14
+ ### `espnet/yoshiki_wsj0_2mix_spatialized_enh_asr_tfgridnet_waspaa2023_raw_en_char`
15
+
16
+ This model was trained by Yoshiki using wsj0_2mix_spatialized recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+
26
+ pip install -e .
27
+ cd egs2/wsj0_2mix_spatialized/enh_asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/yoshiki_wsj0_2mix_spatialized_enh_asr_tfgridnet_waspaa2023_raw_en_char
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Sun Aug 13 19:05:53 UTC 2023`
35
+ - python version: `3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]`
36
+ - espnet version: `espnet 202304`
37
+ - pytorch version: `pytorch 1.10.1+cu111`
38
+ - Git hash: ``
39
+ - Commit date: ``
40
+
41
+ ## exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_anechoic_multich_max_16k|6000|98613|98.7|1.2|0.1|0.5|1.7|16.5|
47
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_reverb_multich_max_16k|6000|98613|98.7|1.3|0.1|0.4|1.7|17.8|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_anechoic_multich_max_16k|6000|598296|99.6|0.2|0.2|0.3|0.7|21.6|
54
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_reverb_multich_max_16k|6000|598296|99.6|0.2|0.3|0.3|0.7|23.0|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+
61
+ ## EnhS2T config
62
+
63
+ <details><summary>expand</summary>
64
+
65
+ ```
66
+ config: conf/tuning/train_enh_asr_tfgridnet_waspaa.yaml
67
+ print_config: false
68
+ log_level: INFO
69
+ dry_run: false
70
+ iterator_type: sequence
71
+ output_dir: exp/enh_asr_train_enh_asr_tfgridnet_waspaa_raw_en_char
72
+ ngpu: 1
73
+ seed: 0
74
+ num_workers: 0
75
+ num_att_plot: 0
76
+ dist_backend: nccl
77
+ dist_init_method: env://
78
+ dist_world_size: null
79
+ dist_rank: null
80
+ local_rank: 0
81
+ dist_master_addr: null
82
+ dist_master_port: null
83
+ dist_launcher: null
84
+ multiprocessing_distributed: false
85
+ unused_parameters: true
86
+ sharded_ddp: false
87
+ cudnn_enabled: true
88
+ cudnn_benchmark: false
89
+ cudnn_deterministic: true
90
+ collect_stats: false
91
+ write_collected_feats: false
92
+ max_epoch: 11
93
+ patience: 10
94
+ val_scheduler_criterion:
95
+ - valid
96
+ - loss
97
+ early_stopping_criterion:
98
+ - valid
99
+ - loss
100
+ - min
101
+ best_model_criterion:
102
+ - - valid
103
+ - acc
104
+ - max
105
+ - - train
106
+ - loss
107
+ - min
108
+ keep_nbest_models: 10
109
+ nbest_averaging_interval: 0
110
+ grad_clip: 5
111
+ grad_clip_type: 2.0
112
+ grad_noise: false
113
+ accum_grad: 4
114
+ no_forward_run: false
115
+ resume: true
116
+ train_dtype: float32
117
+ use_amp: false
118
+ log_interval: null
119
+ use_matplotlib: true
120
+ use_tensorboard: true
121
+ create_graph_in_tensorboard: false
122
+ use_wandb: false
123
+ wandb_project: null
124
+ wandb_id: null
125
+ wandb_entity: null
126
+ wandb_name: null
127
+ wandb_model_log_interval: -1
128
+ detect_anomaly: false
129
+ pretrain_path: null
130
+ init_param:
131
+ - ../enh1/exp/enh_train_enh_tfgridnet_waspaa2023_raw/valid.loss.best.pth:separator:enh_model.separator
132
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:frontend:s2t_model.frontend
133
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:preencoder:s2t_model.preencoder
134
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:encoder:s2t_model.encoder
135
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:ctc:s2t_model.ctc
136
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:decoder:s2t_model.decoder
137
+ ignore_init_mismatch: false
138
+ freeze_param:
139
+ - s2t_model.frontend.upstream
140
+ num_iters_per_epoch: 20000
141
+ batch_size: 2
142
+ valid_batch_size: null
143
+ batch_bins: 1000000
144
+ valid_batch_bins: null
145
+ train_shape_file:
146
+ - exp/enh_asr_stats_raw_en_char/train/speech_shape
147
+ - exp/enh_asr_stats_raw_en_char/train/speech_ref1_shape
148
+ - exp/enh_asr_stats_raw_en_char/train/text_spk1_shape.char
149
+ - exp/enh_asr_stats_raw_en_char/train/speech_ref2_shape
150
+ - exp/enh_asr_stats_raw_en_char/train/text_spk2_shape.char
151
+ valid_shape_file:
152
+ - exp/enh_asr_stats_raw_en_char/valid/speech_shape
153
+ - exp/enh_asr_stats_raw_en_char/valid/speech_ref1_shape
154
+ - exp/enh_asr_stats_raw_en_char/valid/text_spk1_shape.char
155
+ - exp/enh_asr_stats_raw_en_char/valid/speech_ref2_shape
156
+ - exp/enh_asr_stats_raw_en_char/valid/text_spk2_shape.char
157
+ batch_type: folded
158
+ valid_batch_type: null
159
+ fold_length:
160
+ - 80000
161
+ - 80000
162
+ - 150
163
+ - 80000
164
+ - 150
165
+ sort_in_batch: descending
166
+ sort_batch: descending
167
+ multiple_iterator: false
168
+ chunk_length: 500
169
+ chunk_shift_ratio: 0.5
170
+ num_cache_chunks: 1024
171
+ chunk_excluded_key_prefixes: []
172
+ train_data_path_and_name_and_type:
173
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/wav.scp
174
+ - speech
175
+ - sound
176
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/spk1.scp
177
+ - speech_ref1
178
+ - sound
179
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/text_spk1
180
+ - text_spk1
181
+ - text
182
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/spk2.scp
183
+ - speech_ref2
184
+ - sound
185
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/text_spk2
186
+ - text_spk2
187
+ - text
188
+ valid_data_path_and_name_and_type:
189
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/wav.scp
190
+ - speech
191
+ - sound
192
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/spk1.scp
193
+ - speech_ref1
194
+ - sound
195
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/text_spk1
196
+ - text_spk1
197
+ - text
198
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/spk2.scp
199
+ - speech_ref2
200
+ - sound
201
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/text_spk2
202
+ - text_spk2
203
+ - text
204
+ allow_variable_data_keys: false
205
+ max_cache_size: 0.0
206
+ max_cache_fd: 32
207
+ valid_max_cache_size: null
208
+ exclude_weight_decay: false
209
+ exclude_weight_decay_conf: {}
210
+ optim: sgd
211
+ optim_conf:
212
+ lr: 0.001
213
+ momentum: 0.9
214
+ scheduler: null
215
+ scheduler_conf: {}
216
+ token_list: data/en_token_list/char/tokens.txt
217
+ src_token_list: null
218
+ init: xavier_uniform
219
+ input_size: null
220
+ ctc_conf:
221
+ dropout_rate: 0.0
222
+ ctc_type: builtin
223
+ reduce: true
224
+ ignore_nan_grad: null
225
+ zero_infinity: true
226
+ enh_criterions:
227
+ - name: mr_l1_tfd
228
+ conf:
229
+ window_sz:
230
+ - 512
231
+ time_domain_weight: 1.0
232
+ wrapper: pit
233
+ wrapper_conf:
234
+ weight: 1.0
235
+ diar_num_spk: null
236
+ diar_input_size: null
237
+ enh_model_conf:
238
+ stft_consistency: false
239
+ loss_type: mask_mse
240
+ mask_type: null
241
+ asr_model_conf:
242
+ ctc_weight: 0.3
243
+ lsm_weight: 0.1
244
+ length_normalized_loss: false
245
+ extract_feats_in_collect_stats: false
246
+ st_model_conf:
247
+ stft_consistency: false
248
+ loss_type: mask_mse
249
+ mask_type: null
250
+ diar_model_conf:
251
+ diar_weight: 1.0
252
+ attractor_weight: 1.0
253
+ subtask_series:
254
+ - enh
255
+ - asr
256
+ model_conf:
257
+ bypass_enh_prob: 0.0
258
+ calc_enh_loss: false
259
+ use_preprocessor: true
260
+ token_type: char
261
+ bpemodel: null
262
+ src_token_type: bpe
263
+ src_bpemodel: null
264
+ non_linguistic_symbols: data/nlsyms.txt
265
+ cleaner: null
266
+ g2p: null
267
+ text_name:
268
+ - text_spk1
269
+ - text_spk2
270
+ enh_encoder: same
271
+ enh_encoder_conf: {}
272
+ enh_separator: tfgridnet
273
+ enh_separator_conf:
274
+ n_srcs: 2
275
+ n_fft: 512
276
+ stride: 256
277
+ window: hann
278
+ n_imics: 8
279
+ n_layers: 6
280
+ lstm_hidden_units: 192
281
+ attn_n_head: 4
282
+ attn_approx_qk_dim: 512
283
+ emb_dim: 48
284
+ emb_ks: 4
285
+ emb_hs: 2
286
+ activation: gelu
287
+ eps: 1.0e-05
288
+ ref_channel: 0
289
+ enh_decoder: same
290
+ enh_decoder_conf: {}
291
+ enh_mask_module: multi_mask
292
+ enh_mask_module_conf: {}
293
+ frontend: s3prl
294
+ frontend_conf:
295
+ frontend_conf:
296
+ upstream: wavlm_large
297
+ download_dir: ./hub
298
+ multilayer_feature: true
299
+ fs: 16k
300
+ specaug: specaug
301
+ specaug_conf:
302
+ apply_time_warp: true
303
+ time_warp_window: 5
304
+ time_warp_mode: bicubic
305
+ apply_freq_mask: true
306
+ freq_mask_width_range:
307
+ - 0
308
+ - 100
309
+ num_freq_mask: 4
310
+ apply_time_mask: true
311
+ time_mask_width_range:
312
+ - 0
313
+ - 40
314
+ num_time_mask: 2
315
+ normalize: utterance_mvn
316
+ normalize_conf: {}
317
+ asr_preencoder: linear
318
+ asr_preencoder_conf:
319
+ input_size: 1024
320
+ output_size: 80
321
+ asr_encoder: conformer
322
+ asr_encoder_conf:
323
+ output_size: 256
324
+ attention_heads: 4
325
+ linear_units: 2048
326
+ num_blocks: 12
327
+ dropout_rate: 0.1
328
+ positional_dropout_rate: 0.1
329
+ attention_dropout_rate: 0.0
330
+ input_layer: conv2d2
331
+ normalize_before: true
332
+ macaron_style: true
333
+ rel_pos_type: latest
334
+ pos_enc_layer_type: rel_pos
335
+ selfattention_layer_type: rel_selfattn
336
+ activation_type: swish
337
+ use_cnn_module: true
338
+ cnn_module_kernel: 15
339
+ asr_postencoder: null
340
+ asr_postencoder_conf: {}
341
+ asr_decoder: transformer
342
+ asr_decoder_conf:
343
+ input_layer: embed
344
+ attention_heads: 4
345
+ linear_units: 2048
346
+ num_blocks: 6
347
+ dropout_rate: 0.1
348
+ positional_dropout_rate: 0.1
349
+ self_attention_dropout_rate: 0.0
350
+ src_attention_dropout_rate: 0.0
351
+ st_preencoder: null
352
+ st_preencoder_conf: {}
353
+ st_encoder: rnn
354
+ st_encoder_conf: {}
355
+ st_postencoder: null
356
+ st_postencoder_conf: {}
357
+ st_decoder: rnn
358
+ st_decoder_conf: {}
359
+ st_extra_asr_decoder: rnn
360
+ st_extra_asr_decoder_conf: {}
361
+ st_extra_mt_decoder: rnn
362
+ st_extra_mt_decoder_conf: {}
363
+ diar_frontend: default
364
+ diar_frontend_conf: {}
365
+ diar_specaug: null
366
+ diar_specaug_conf: {}
367
+ diar_normalize: utterance_mvn
368
+ diar_normalize_conf: {}
369
+ diar_encoder: transformer
370
+ diar_encoder_conf: {}
371
+ diar_decoder: linear
372
+ diar_decoder_conf: {}
373
+ label_aggregator: label_aggregator
374
+ label_aggregator_conf: {}
375
+ diar_attractor: null
376
+ diar_attractor_conf: {}
377
+ required:
378
+ - output_dir
379
+ version: '202304'
380
+ distributed: false
381
+ ```
382
+
383
+ </details>
384
+
385
+
386
+
387
+ ### Citing ESPnet
388
+
389
+ ```BibTex
390
+ @inproceedings{watanabe2018espnet,
391
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
392
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
393
+ year={2018},
394
+ booktitle={Proceedings of Interspeech},
395
+ pages={2207--2211},
396
+ doi={10.21437/Interspeech.2018-1456},
397
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
398
+ }
399
+
400
+
401
+
402
+
403
+ ```
404
+
405
+ or arXiv:
406
+
407
+ ```bibtex
408
+ @misc{watanabe2018espnet,
409
+ title={ESPnet: End-to-End Speech Processing Toolkit},
410
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
411
+ year={2018},
412
+ eprint={1804.00015},
413
+ archivePrefix={arXiv},
414
+ primaryClass={cs.CL}
415
+ }
416
+ ```
data/nlsyms.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ <*IN*>
2
+ <*MR.*>
3
+ <NOISE>
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/config.yaml ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_lm_transformer.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/lm_train_lm_transformer_en_char
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 25
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 2
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 350000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/lm_stats_en_char/train/text_shape.char
72
+ valid_shape_file:
73
+ - exp/lm_stats_en_char/valid/text_shape.char
74
+ batch_type: numel
75
+ valid_batch_type: null
76
+ fold_length:
77
+ - 150
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ chunk_excluded_key_prefixes: []
85
+ train_data_path_and_name_and_type:
86
+ - - dump/raw/lm_train.txt
87
+ - text
88
+ - text
89
+ valid_data_path_and_name_and_type:
90
+ - - dump/raw/org/test_dev93/text
91
+ - text
92
+ - text
93
+ allow_variable_data_keys: false
94
+ max_cache_size: 0.0
95
+ max_cache_fd: 32
96
+ valid_max_cache_size: null
97
+ exclude_weight_decay: false
98
+ exclude_weight_decay_conf: {}
99
+ optim: adam
100
+ optim_conf:
101
+ lr: 0.001
102
+ scheduler: warmuplr
103
+ scheduler_conf:
104
+ warmup_steps: 25000
105
+ token_list:
106
+ - <blank>
107
+ - <unk>
108
+ - <space>
109
+ - E
110
+ - T
111
+ - A
112
+ - N
113
+ - I
114
+ - O
115
+ - S
116
+ - R
117
+ - H
118
+ - L
119
+ - D
120
+ - C
121
+ - U
122
+ - M
123
+ - P
124
+ - F
125
+ - G
126
+ - Y
127
+ - W
128
+ - B
129
+ - V
130
+ - K
131
+ - .
132
+ - X
133
+ - ''''
134
+ - J
135
+ - Q
136
+ - Z
137
+ - <NOISE>
138
+ - ','
139
+ - '-'
140
+ - '"'
141
+ - '*'
142
+ - ':'
143
+ - (
144
+ - )
145
+ - '?'
146
+ - '!'
147
+ - '&'
148
+ - ;
149
+ - '1'
150
+ - '2'
151
+ - '0'
152
+ - /
153
+ - $
154
+ - '{'
155
+ - '}'
156
+ - '8'
157
+ - '9'
158
+ - '6'
159
+ - '3'
160
+ - '5'
161
+ - '7'
162
+ - '4'
163
+ - '~'
164
+ - '`'
165
+ - _
166
+ - <*IN*>
167
+ - <*MR.*>
168
+ - \
169
+ - ^
170
+ - <sos/eos>
171
+ init: null
172
+ model_conf:
173
+ ignore_id: 0
174
+ use_preprocessor: true
175
+ token_type: char
176
+ bpemodel: null
177
+ non_linguistic_symbols: data/nlsyms.txt
178
+ cleaner: null
179
+ g2p: null
180
+ lm: transformer
181
+ lm_conf:
182
+ pos_enc: null
183
+ embed_unit: 128
184
+ att_unit: 512
185
+ head: 8
186
+ unit: 2048
187
+ layer: 16
188
+ dropout_rate: 0.1
189
+ required:
190
+ - output_dir
191
+ - token_list
192
+ version: '202304'
193
+ distributed: false
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/backward_time.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/clip.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/forward_time.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/gpu_max_cached_mem_GB.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/grad_norm.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/iter_time.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/loss.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/loss_scale.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/optim0_lr0.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/optim_step_time.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/images/train_time.png ADDED
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/perplexity_test/ppl ADDED
@@ -0,0 +1 @@
 
 
1
+ 2.188535511410594
espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fb9bd148f71a9fc77317bc243d42e24dd8fc9fca1015b445613864bc1dbee39
3
+ size 202290031
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/8epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6541503598666ac8455c6856d1f7a0b28213aa0df5652aefd15b57b6abf8182
3
+ size 1473035049
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Aug 13 19:05:53 UTC 2023`
5
+ - python version: `3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 1.10.1+cu111`
8
+ - Git hash: ``
9
+ - Commit date: ``
10
+
11
+ ## exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_anechoic_multich_max_16k|6000|98613|98.7|1.2|0.1|0.5|1.7|16.5|
17
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_reverb_multich_max_16k|6000|98613|98.7|1.3|0.1|0.4|1.7|17.8|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_anechoic_multich_max_16k|6000|598296|99.6|0.2|0.2|0.3|0.7|21.6|
24
+ |decode_asr_transformer_normalize_output_wavtrue_lm_lm_train_lm_transformer_en_char_valid.loss.ave_enh_asr_model_valid.acc.best/tt_spatialized_reverb_multich_max_16k|6000|598296|99.6|0.2|0.3|0.3|0.7|23.0|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/RESULTS_enh.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Aug 13 19:18:04 UTC 2023`
5
+ - python version: `3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 1.10.1+cu111`
8
+ - Git hash: ``
9
+ - Commit date: ``
10
+
11
+
12
+ ## enh_asr_train_enh_asr_tfgridnet_waspaa_raw_en_char
13
+
14
+ config: conf/tuning/train_enh_asr_tfgridnet_waspaa2023.yaml
15
+
16
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|
17
+ |---|---|---|---|---|---|
18
+ |enhanced_tt_spatialized_anechoic_multich_max_16k|97.73|13.36|13.19|29.59|12.63|
19
+ |enhanced_tt_spatialized_reverb_multich_max_16k|95.15|11.68|11.47|26.41|10.91|
20
+
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/config.yaml ADDED
@@ -0,0 +1,315 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_asr_tfgridnet_waspaa.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/enh_asr_train_enh_asr_tfgridnet_waspaa_raw_en_char
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 0
10
+ num_att_plot: 0
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 11
28
+ patience: 10
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ - - train
41
+ - loss
42
+ - min
43
+ keep_nbest_models: 10
44
+ nbest_averaging_interval: 0
45
+ grad_clip: 5
46
+ grad_clip_type: 2.0
47
+ grad_noise: false
48
+ accum_grad: 4
49
+ no_forward_run: false
50
+ resume: true
51
+ train_dtype: float32
52
+ use_amp: false
53
+ log_interval: null
54
+ use_matplotlib: true
55
+ use_tensorboard: true
56
+ create_graph_in_tensorboard: false
57
+ use_wandb: false
58
+ wandb_project: null
59
+ wandb_id: null
60
+ wandb_entity: null
61
+ wandb_name: null
62
+ wandb_model_log_interval: -1
63
+ detect_anomaly: false
64
+ pretrain_path: null
65
+ init_param:
66
+ - ../enh1/exp/enh_train_enh_tfgridnet_waspaa2023_raw/valid.loss.best.pth:separator:enh_model.separator
67
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:frontend:s2t_model.frontend
68
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:preencoder:s2t_model.preencoder
69
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:encoder:s2t_model.encoder
70
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:ctc:s2t_model.ctc
71
+ - ../../wsj/asr1/exp/asr_train_asr_conformer_s3prlfrontend_wavlm_raw_en_char/valid.acc.best.pth:decoder:s2t_model.decoder
72
+ ignore_init_mismatch: false
73
+ freeze_param:
74
+ - s2t_model.frontend.upstream
75
+ num_iters_per_epoch: 20000
76
+ batch_size: 2
77
+ valid_batch_size: null
78
+ batch_bins: 1000000
79
+ valid_batch_bins: null
80
+ train_shape_file:
81
+ - exp/enh_asr_stats_raw_en_char/train/speech_shape
82
+ - exp/enh_asr_stats_raw_en_char/train/speech_ref1_shape
83
+ - exp/enh_asr_stats_raw_en_char/train/text_spk1_shape.char
84
+ - exp/enh_asr_stats_raw_en_char/train/speech_ref2_shape
85
+ - exp/enh_asr_stats_raw_en_char/train/text_spk2_shape.char
86
+ valid_shape_file:
87
+ - exp/enh_asr_stats_raw_en_char/valid/speech_shape
88
+ - exp/enh_asr_stats_raw_en_char/valid/speech_ref1_shape
89
+ - exp/enh_asr_stats_raw_en_char/valid/text_spk1_shape.char
90
+ - exp/enh_asr_stats_raw_en_char/valid/speech_ref2_shape
91
+ - exp/enh_asr_stats_raw_en_char/valid/text_spk2_shape.char
92
+ batch_type: folded
93
+ valid_batch_type: null
94
+ fold_length:
95
+ - 80000
96
+ - 80000
97
+ - 150
98
+ - 80000
99
+ - 150
100
+ sort_in_batch: descending
101
+ sort_batch: descending
102
+ multiple_iterator: false
103
+ chunk_length: 500
104
+ chunk_shift_ratio: 0.5
105
+ num_cache_chunks: 1024
106
+ chunk_excluded_key_prefixes: []
107
+ train_data_path_and_name_and_type:
108
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/wav.scp
109
+ - speech
110
+ - sound
111
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/spk1.scp
112
+ - speech_ref1
113
+ - sound
114
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/text_spk1
115
+ - text_spk1
116
+ - text
117
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/spk2.scp
118
+ - speech_ref2
119
+ - sound
120
+ - - dump/raw/tr_spatialized_multi_multich_max_16k/text_spk2
121
+ - text_spk2
122
+ - text
123
+ valid_data_path_and_name_and_type:
124
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/wav.scp
125
+ - speech
126
+ - sound
127
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/spk1.scp
128
+ - speech_ref1
129
+ - sound
130
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/text_spk1
131
+ - text_spk1
132
+ - text
133
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/spk2.scp
134
+ - speech_ref2
135
+ - sound
136
+ - - dump/raw/cv_spatialized_multi_multich_max_16k/text_spk2
137
+ - text_spk2
138
+ - text
139
+ allow_variable_data_keys: false
140
+ max_cache_size: 0.0
141
+ max_cache_fd: 32
142
+ valid_max_cache_size: null
143
+ exclude_weight_decay: false
144
+ exclude_weight_decay_conf: {}
145
+ optim: sgd
146
+ optim_conf:
147
+ lr: 0.001
148
+ momentum: 0.9
149
+ scheduler: null
150
+ scheduler_conf: {}
151
+ token_list: data/en_token_list/char/tokens.txt
152
+ src_token_list: null
153
+ init: xavier_uniform
154
+ input_size: null
155
+ ctc_conf:
156
+ dropout_rate: 0.0
157
+ ctc_type: builtin
158
+ reduce: true
159
+ ignore_nan_grad: null
160
+ zero_infinity: true
161
+ enh_criterions:
162
+ - name: mr_l1_tfd
163
+ conf:
164
+ window_sz:
165
+ - 512
166
+ time_domain_weight: 1.0
167
+ wrapper: pit
168
+ wrapper_conf:
169
+ weight: 1.0
170
+ diar_num_spk: null
171
+ diar_input_size: null
172
+ enh_model_conf:
173
+ stft_consistency: false
174
+ loss_type: mask_mse
175
+ mask_type: null
176
+ asr_model_conf:
177
+ ctc_weight: 0.3
178
+ lsm_weight: 0.1
179
+ length_normalized_loss: false
180
+ extract_feats_in_collect_stats: false
181
+ st_model_conf:
182
+ stft_consistency: false
183
+ loss_type: mask_mse
184
+ mask_type: null
185
+ diar_model_conf:
186
+ diar_weight: 1.0
187
+ attractor_weight: 1.0
188
+ subtask_series:
189
+ - enh
190
+ - asr
191
+ model_conf:
192
+ bypass_enh_prob: 0.0
193
+ calc_enh_loss: false
194
+ use_preprocessor: true
195
+ token_type: char
196
+ bpemodel: null
197
+ src_token_type: bpe
198
+ src_bpemodel: null
199
+ non_linguistic_symbols: data/nlsyms.txt
200
+ cleaner: null
201
+ g2p: null
202
+ text_name:
203
+ - text_spk1
204
+ - text_spk2
205
+ enh_encoder: same
206
+ enh_encoder_conf: {}
207
+ enh_separator: tfgridnet
208
+ enh_separator_conf:
209
+ n_srcs: 2
210
+ n_fft: 512
211
+ stride: 256
212
+ window: hann
213
+ n_imics: 8
214
+ n_layers: 6
215
+ lstm_hidden_units: 192
216
+ attn_n_head: 4
217
+ attn_approx_qk_dim: 512
218
+ emb_dim: 48
219
+ emb_ks: 4
220
+ emb_hs: 2
221
+ activation: gelu
222
+ eps: 1.0e-05
223
+ ref_channel: 0
224
+ enh_decoder: same
225
+ enh_decoder_conf: {}
226
+ enh_mask_module: multi_mask
227
+ enh_mask_module_conf: {}
228
+ frontend: s3prl
229
+ frontend_conf:
230
+ frontend_conf:
231
+ upstream: wavlm_large
232
+ download_dir: ./hub
233
+ multilayer_feature: true
234
+ fs: 16k
235
+ specaug: specaug
236
+ specaug_conf:
237
+ apply_time_warp: true
238
+ time_warp_window: 5
239
+ time_warp_mode: bicubic
240
+ apply_freq_mask: true
241
+ freq_mask_width_range:
242
+ - 0
243
+ - 100
244
+ num_freq_mask: 4
245
+ apply_time_mask: true
246
+ time_mask_width_range:
247
+ - 0
248
+ - 40
249
+ num_time_mask: 2
250
+ normalize: utterance_mvn
251
+ normalize_conf: {}
252
+ asr_preencoder: linear
253
+ asr_preencoder_conf:
254
+ input_size: 1024
255
+ output_size: 80
256
+ asr_encoder: conformer
257
+ asr_encoder_conf:
258
+ output_size: 256
259
+ attention_heads: 4
260
+ linear_units: 2048
261
+ num_blocks: 12
262
+ dropout_rate: 0.1
263
+ positional_dropout_rate: 0.1
264
+ attention_dropout_rate: 0.0
265
+ input_layer: conv2d2
266
+ normalize_before: true
267
+ macaron_style: true
268
+ rel_pos_type: latest
269
+ pos_enc_layer_type: rel_pos
270
+ selfattention_layer_type: rel_selfattn
271
+ activation_type: swish
272
+ use_cnn_module: true
273
+ cnn_module_kernel: 15
274
+ asr_postencoder: null
275
+ asr_postencoder_conf: {}
276
+ asr_decoder: transformer
277
+ asr_decoder_conf:
278
+ input_layer: embed
279
+ attention_heads: 4
280
+ linear_units: 2048
281
+ num_blocks: 6
282
+ dropout_rate: 0.1
283
+ positional_dropout_rate: 0.1
284
+ self_attention_dropout_rate: 0.0
285
+ src_attention_dropout_rate: 0.0
286
+ st_preencoder: null
287
+ st_preencoder_conf: {}
288
+ st_encoder: rnn
289
+ st_encoder_conf: {}
290
+ st_postencoder: null
291
+ st_postencoder_conf: {}
292
+ st_decoder: rnn
293
+ st_decoder_conf: {}
294
+ st_extra_asr_decoder: rnn
295
+ st_extra_asr_decoder_conf: {}
296
+ st_extra_mt_decoder: rnn
297
+ st_extra_mt_decoder_conf: {}
298
+ diar_frontend: default
299
+ diar_frontend_conf: {}
300
+ diar_specaug: null
301
+ diar_specaug_conf: {}
302
+ diar_normalize: utterance_mvn
303
+ diar_normalize_conf: {}
304
+ diar_encoder: transformer
305
+ diar_encoder_conf: {}
306
+ diar_decoder: linear
307
+ diar_decoder_conf: {}
308
+ label_aggregator: label_aggregator
309
+ label_aggregator_conf: {}
310
+ diar_attractor: null
311
+ diar_attractor_conf: {}
312
+ required:
313
+ - output_dir
314
+ version: '202304'
315
+ distributed: false
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/acc.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/backward_time.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/cer.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/cer_ctc.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/clip.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/forward_time.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/gpu_max_cached_mem_GB.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/grad_norm.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/iter_time.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_asr.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_att.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_ctc.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_enh.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/loss_scale.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/optim0_lr0.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/optim_step_time.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/train_time.png ADDED
exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/images/wer.png ADDED
meta.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ enh_s2t_model_file: exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/8epoch.pth
4
+ lm_file: /espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth
5
+ python: "3.7.4 (default, Aug 13 2019, 20:35:49) \n[GCC 7.3.0]"
6
+ timestamp: 1692067642.684803
7
+ torch: 1.10.1+cu111
8
+ yaml_files:
9
+ enh_s2t_train_config: exp/enh_asr_train_enh_asr_tfgridnet_waspaa2023_raw_en_char/config.yaml
10
+ lm_train_config: /espnet/egs2/wsj/asr1/exp/lm_train_lm_transformer_en_char/config.yaml