Automatic Speech Recognition
ESPnet
English
audio
pyf98 commited on
Commit
9ebf14a
1 Parent(s): ffac12b

add models

Browse files
Files changed (32) hide show
  1. README.md +358 -0
  2. data/nlsyms.txt +3 -0
  3. exp/asr_stats_raw_en_char/train/feats_stats.npz +3 -0
  4. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/RESULTS.md +29 -0
  5. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/config.yaml +253 -0
  6. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/acc.png +0 -0
  7. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/backward_time.png +0 -0
  8. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/cer.png +0 -0
  9. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/cer_ctc.png +0 -0
  10. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/forward_time.png +0 -0
  11. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/gpu_max_cached_mem_GB.png +0 -0
  12. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/iter_time.png +0 -0
  13. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss.png +0 -0
  14. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss_att.png +0 -0
  15. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss_ctc.png +0 -0
  16. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/optim0_lr0.png +0 -0
  17. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/optim_step_time.png +0 -0
  18. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/train_time.png +0 -0
  19. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/wer.png +0 -0
  20. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/valid.acc.ave_10best.pth +3 -0
  21. exp/lm_train_lm_transformer_en_char/config.yaml +190 -0
  22. exp/lm_train_lm_transformer_en_char/images/backward_time.png +0 -0
  23. exp/lm_train_lm_transformer_en_char/images/forward_time.png +0 -0
  24. exp/lm_train_lm_transformer_en_char/images/gpu_max_cached_mem_GB.png +0 -0
  25. exp/lm_train_lm_transformer_en_char/images/iter_time.png +0 -0
  26. exp/lm_train_lm_transformer_en_char/images/loss.png +0 -0
  27. exp/lm_train_lm_transformer_en_char/images/optim0_lr0.png +0 -0
  28. exp/lm_train_lm_transformer_en_char/images/optim_step_time.png +0 -0
  29. exp/lm_train_lm_transformer_en_char/images/train_time.png +0 -0
  30. exp/lm_train_lm_transformer_en_char/perplexity_test/ppl +1 -0
  31. exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth +3 -0
  32. meta.yaml +10 -0
README.md ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - wsj
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `pyf98/wsj_e_branchformer`
15
+
16
+ This model was trained by Yifan Peng using wsj recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ References:
19
+ - [E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022)](https://arxiv.org/abs/2210.00077)
20
+ - [Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022)](https://proceedings.mlr.press/v162/peng22a.html)
21
+
22
+ ### Demo: How to use in ESPnet2
23
+
24
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
25
+ if you haven't done that already.
26
+
27
+ ```bash
28
+ cd espnet
29
+ git checkout 0aa06d0535323aabc1d8b057f8769da377f4d9ff
30
+ pip install -e .
31
+ cd egs2/wsj/asr1
32
+ ./run.sh --skip_data_prep false --skip_train true --download_model pyf98/wsj_e_branchformer
33
+ ```
34
+
35
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
36
+ # RESULTS
37
+ ## Environments
38
+ - date: `Wed Dec 28 00:12:25 EST 2022`
39
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
40
+ - espnet version: `espnet 202211`
41
+ - pytorch version: `pytorch 1.12.1`
42
+ - Git hash: `0aa06d0535323aabc1d8b057f8769da377f4d9ff`
43
+ - Commit date: `Tue Dec 27 15:08:25 2022 -0600`
44
+
45
+ ## asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char
46
+ ### WER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_dev93|503|8234|94.3|4.9|0.8|0.7|6.5|51.9|
51
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_eval92|333|5643|96.4|3.3|0.3|0.7|4.3|38.1|
52
+
53
+ ### CER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_dev93|503|48634|97.8|1.0|1.1|0.6|2.8|58.3|
58
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_eval92|333|33341|98.7|0.7|0.7|0.5|1.8|46.5|
59
+
60
+ ### TER
61
+
62
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
63
+ |---|---|---|---|---|---|---|---|---|
64
+
65
+ ## ASR config
66
+
67
+ <details><summary>expand</summary>
68
+
69
+ ```
70
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp1024_linear1024.yaml
71
+ print_config: false
72
+ log_level: INFO
73
+ dry_run: false
74
+ iterator_type: sequence
75
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char
76
+ ngpu: 1
77
+ seed: 0
78
+ num_workers: 4
79
+ num_att_plot: 3
80
+ dist_backend: nccl
81
+ dist_init_method: env://
82
+ dist_world_size: null
83
+ dist_rank: null
84
+ local_rank: 0
85
+ dist_master_addr: null
86
+ dist_master_port: null
87
+ dist_launcher: null
88
+ multiprocessing_distributed: false
89
+ unused_parameters: false
90
+ sharded_ddp: false
91
+ cudnn_enabled: true
92
+ cudnn_benchmark: false
93
+ cudnn_deterministic: true
94
+ collect_stats: false
95
+ write_collected_feats: false
96
+ max_epoch: 100
97
+ patience: null
98
+ val_scheduler_criterion:
99
+ - valid
100
+ - loss
101
+ early_stopping_criterion:
102
+ - valid
103
+ - loss
104
+ - min
105
+ best_model_criterion:
106
+ - - valid
107
+ - acc
108
+ - max
109
+ keep_nbest_models: 10
110
+ nbest_averaging_interval: 0
111
+ grad_clip: 5.0
112
+ grad_clip_type: 2.0
113
+ grad_noise: false
114
+ accum_grad: 2
115
+ no_forward_run: false
116
+ resume: true
117
+ train_dtype: float32
118
+ use_amp: true
119
+ log_interval: 100
120
+ use_matplotlib: true
121
+ use_tensorboard: true
122
+ create_graph_in_tensorboard: false
123
+ use_wandb: false
124
+ wandb_project: null
125
+ wandb_id: null
126
+ wandb_entity: null
127
+ wandb_name: null
128
+ wandb_model_log_interval: -1
129
+ detect_anomaly: false
130
+ pretrain_path: null
131
+ init_param: []
132
+ ignore_init_mismatch: false
133
+ freeze_param: []
134
+ num_iters_per_epoch: null
135
+ batch_size: 128
136
+ valid_batch_size: null
137
+ batch_bins: 1000000
138
+ valid_batch_bins: null
139
+ train_shape_file:
140
+ - exp/asr_stats_raw_en_char/train/speech_shape
141
+ - exp/asr_stats_raw_en_char/train/text_shape.char
142
+ valid_shape_file:
143
+ - exp/asr_stats_raw_en_char/valid/speech_shape
144
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
145
+ batch_type: folded
146
+ valid_batch_type: null
147
+ fold_length:
148
+ - 80000
149
+ - 150
150
+ sort_in_batch: descending
151
+ sort_batch: descending
152
+ multiple_iterator: false
153
+ chunk_length: 500
154
+ chunk_shift_ratio: 0.5
155
+ num_cache_chunks: 1024
156
+ train_data_path_and_name_and_type:
157
+ - - dump/raw/train_si284/wav.scp
158
+ - speech
159
+ - sound
160
+ - - dump/raw/train_si284/text
161
+ - text
162
+ - text
163
+ valid_data_path_and_name_and_type:
164
+ - - dump/raw/test_dev93/wav.scp
165
+ - speech
166
+ - sound
167
+ - - dump/raw/test_dev93/text
168
+ - text
169
+ - text
170
+ allow_variable_data_keys: false
171
+ max_cache_size: 0.0
172
+ max_cache_fd: 32
173
+ valid_max_cache_size: null
174
+ optim: adam
175
+ optim_conf:
176
+ lr: 0.005
177
+ scheduler: warmuplr
178
+ scheduler_conf:
179
+ warmup_steps: 30000
180
+ token_list:
181
+ - <blank>
182
+ - <unk>
183
+ - <space>
184
+ - E
185
+ - T
186
+ - A
187
+ - N
188
+ - I
189
+ - O
190
+ - S
191
+ - R
192
+ - H
193
+ - L
194
+ - D
195
+ - C
196
+ - U
197
+ - M
198
+ - P
199
+ - F
200
+ - G
201
+ - Y
202
+ - W
203
+ - B
204
+ - V
205
+ - K
206
+ - .
207
+ - X
208
+ - ''''
209
+ - J
210
+ - Q
211
+ - Z
212
+ - <NOISE>
213
+ - ','
214
+ - '-'
215
+ - '"'
216
+ - '*'
217
+ - ':'
218
+ - (
219
+ - )
220
+ - '?'
221
+ - '!'
222
+ - '&'
223
+ - ;
224
+ - '1'
225
+ - '2'
226
+ - '0'
227
+ - /
228
+ - $
229
+ - '{'
230
+ - '}'
231
+ - '8'
232
+ - '9'
233
+ - '6'
234
+ - '3'
235
+ - '5'
236
+ - '7'
237
+ - '4'
238
+ - '~'
239
+ - '`'
240
+ - _
241
+ - <*IN*>
242
+ - <*MR.*>
243
+ - \
244
+ - ^
245
+ - <sos/eos>
246
+ init: null
247
+ input_size: null
248
+ ctc_conf:
249
+ dropout_rate: 0.0
250
+ ctc_type: builtin
251
+ reduce: true
252
+ ignore_nan_grad: null
253
+ zero_infinity: true
254
+ joint_net_conf: null
255
+ use_preprocessor: true
256
+ token_type: char
257
+ bpemodel: null
258
+ non_linguistic_symbols: data/nlsyms.txt
259
+ cleaner: null
260
+ g2p: null
261
+ speech_volume_normalize: null
262
+ rir_scp: null
263
+ rir_apply_prob: 1.0
264
+ noise_scp: null
265
+ noise_apply_prob: 1.0
266
+ noise_db_range: '13_15'
267
+ short_noise_thres: 0.5
268
+ frontend: default
269
+ frontend_conf:
270
+ fs: 16k
271
+ specaug: null
272
+ specaug_conf: {}
273
+ normalize: global_mvn
274
+ normalize_conf:
275
+ stats_file: exp/asr_stats_raw_en_char/train/feats_stats.npz
276
+ model: espnet
277
+ model_conf:
278
+ ctc_weight: 0.3
279
+ lsm_weight: 0.1
280
+ length_normalized_loss: false
281
+ preencoder: null
282
+ preencoder_conf: {}
283
+ encoder: e_branchformer
284
+ encoder_conf:
285
+ output_size: 256
286
+ attention_heads: 4
287
+ attention_layer_type: rel_selfattn
288
+ pos_enc_layer_type: rel_pos
289
+ rel_pos_type: latest
290
+ cgmlp_linear_units: 1024
291
+ cgmlp_conv_kernel: 31
292
+ use_linear_after_conv: false
293
+ gate_activation: identity
294
+ num_blocks: 12
295
+ dropout_rate: 0.1
296
+ positional_dropout_rate: 0.1
297
+ attention_dropout_rate: 0.1
298
+ input_layer: conv2d
299
+ layer_drop_rate: 0.0
300
+ linear_units: 1024
301
+ positionwise_layer_type: linear
302
+ use_ffn: true
303
+ macaron_ffn: true
304
+ merge_conv_kernel: 31
305
+ postencoder: null
306
+ postencoder_conf: {}
307
+ decoder: transformer
308
+ decoder_conf:
309
+ attention_heads: 4
310
+ linear_units: 2048
311
+ num_blocks: 6
312
+ dropout_rate: 0.1
313
+ positional_dropout_rate: 0.1
314
+ self_attention_dropout_rate: 0.0
315
+ src_attention_dropout_rate: 0.0
316
+ preprocessor: default
317
+ preprocessor_conf: {}
318
+ required:
319
+ - output_dir
320
+ - token_list
321
+ version: '202211'
322
+ distributed: false
323
+ ```
324
+
325
+ </details>
326
+
327
+
328
+
329
+ ### Citing ESPnet
330
+
331
+ ```BibTex
332
+ @inproceedings{watanabe2018espnet,
333
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
334
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
335
+ year={2018},
336
+ booktitle={Proceedings of Interspeech},
337
+ pages={2207--2211},
338
+ doi={10.21437/Interspeech.2018-1456},
339
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
340
+ }
341
+
342
+
343
+
344
+
345
+ ```
346
+
347
+ or arXiv:
348
+
349
+ ```bibtex
350
+ @misc{watanabe2018espnet,
351
+ title={ESPnet: End-to-End Speech Processing Toolkit},
352
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
353
+ year={2018},
354
+ eprint={1804.00015},
355
+ archivePrefix={arXiv},
356
+ primaryClass={cs.CL}
357
+ }
358
+ ```
data/nlsyms.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ <*IN*>
2
+ <*MR.*>
3
+ <NOISE>
exp/asr_stats_raw_en_char/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ea03fa4eea91ad6b7e047a7572be73ed998be1896e389935de240c68ccc1931
3
+ size 1402
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Dec 28 00:12:25 EST 2022`
5
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202211`
7
+ - pytorch version: `pytorch 1.12.1`
8
+ - Git hash: `0aa06d0535323aabc1d8b057f8769da377f4d9ff`
9
+ - Commit date: `Tue Dec 27 15:08:25 2022 -0600`
10
+
11
+ ## asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_dev93|503|8234|94.3|4.9|0.8|0.7|6.5|51.9|
17
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_eval92|333|5643|96.4|3.3|0.3|0.7|4.3|38.1|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_dev93|503|48634|97.8|1.0|1.1|0.6|2.8|58.3|
24
+ |decode_lm_lm_train_lm_transformer_en_char_valid.loss.ave_asr_model_valid.acc.ave/test_eval92|333|33341|98.7|0.7|0.7|0.5|1.8|46.5|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/config.yaml ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp1024_linear1024.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 2
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: true
50
+ log_interval: 100
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 128
67
+ valid_batch_size: null
68
+ batch_bins: 1000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_en_char/train/speech_shape
72
+ - exp/asr_stats_raw_en_char/train/text_shape.char
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_en_char/valid/speech_shape
75
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
76
+ batch_type: folded
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/train_si284/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/raw/train_si284/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/test_dev93/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/test_dev93/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.005
108
+ scheduler: warmuplr
109
+ scheduler_conf:
110
+ warmup_steps: 30000
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - <space>
115
+ - E
116
+ - T
117
+ - A
118
+ - N
119
+ - I
120
+ - O
121
+ - S
122
+ - R
123
+ - H
124
+ - L
125
+ - D
126
+ - C
127
+ - U
128
+ - M
129
+ - P
130
+ - F
131
+ - G
132
+ - Y
133
+ - W
134
+ - B
135
+ - V
136
+ - K
137
+ - .
138
+ - X
139
+ - ''''
140
+ - J
141
+ - Q
142
+ - Z
143
+ - <NOISE>
144
+ - ','
145
+ - '-'
146
+ - '"'
147
+ - '*'
148
+ - ':'
149
+ - (
150
+ - )
151
+ - '?'
152
+ - '!'
153
+ - '&'
154
+ - ;
155
+ - '1'
156
+ - '2'
157
+ - '0'
158
+ - /
159
+ - $
160
+ - '{'
161
+ - '}'
162
+ - '8'
163
+ - '9'
164
+ - '6'
165
+ - '3'
166
+ - '5'
167
+ - '7'
168
+ - '4'
169
+ - '~'
170
+ - '`'
171
+ - _
172
+ - <*IN*>
173
+ - <*MR.*>
174
+ - \
175
+ - ^
176
+ - <sos/eos>
177
+ init: null
178
+ input_size: null
179
+ ctc_conf:
180
+ dropout_rate: 0.0
181
+ ctc_type: builtin
182
+ reduce: true
183
+ ignore_nan_grad: null
184
+ zero_infinity: true
185
+ joint_net_conf: null
186
+ use_preprocessor: true
187
+ token_type: char
188
+ bpemodel: null
189
+ non_linguistic_symbols: data/nlsyms.txt
190
+ cleaner: null
191
+ g2p: null
192
+ speech_volume_normalize: null
193
+ rir_scp: null
194
+ rir_apply_prob: 1.0
195
+ noise_scp: null
196
+ noise_apply_prob: 1.0
197
+ noise_db_range: '13_15'
198
+ short_noise_thres: 0.5
199
+ frontend: default
200
+ frontend_conf:
201
+ fs: 16k
202
+ specaug: null
203
+ specaug_conf: {}
204
+ normalize: global_mvn
205
+ normalize_conf:
206
+ stats_file: exp/asr_stats_raw_en_char/train/feats_stats.npz
207
+ model: espnet
208
+ model_conf:
209
+ ctc_weight: 0.3
210
+ lsm_weight: 0.1
211
+ length_normalized_loss: false
212
+ preencoder: null
213
+ preencoder_conf: {}
214
+ encoder: e_branchformer
215
+ encoder_conf:
216
+ output_size: 256
217
+ attention_heads: 4
218
+ attention_layer_type: rel_selfattn
219
+ pos_enc_layer_type: rel_pos
220
+ rel_pos_type: latest
221
+ cgmlp_linear_units: 1024
222
+ cgmlp_conv_kernel: 31
223
+ use_linear_after_conv: false
224
+ gate_activation: identity
225
+ num_blocks: 12
226
+ dropout_rate: 0.1
227
+ positional_dropout_rate: 0.1
228
+ attention_dropout_rate: 0.1
229
+ input_layer: conv2d
230
+ layer_drop_rate: 0.0
231
+ linear_units: 1024
232
+ positionwise_layer_type: linear
233
+ use_ffn: true
234
+ macaron_ffn: true
235
+ merge_conv_kernel: 31
236
+ postencoder: null
237
+ postencoder_conf: {}
238
+ decoder: transformer
239
+ decoder_conf:
240
+ attention_heads: 4
241
+ linear_units: 2048
242
+ num_blocks: 6
243
+ dropout_rate: 0.1
244
+ positional_dropout_rate: 0.1
245
+ self_attention_dropout_rate: 0.0
246
+ src_attention_dropout_rate: 0.0
247
+ preprocessor: default
248
+ preprocessor_conf: {}
249
+ required:
250
+ - output_dir
251
+ - token_list
252
+ version: '202211'
253
+ distributed: false
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/acc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/backward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/cer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/cer_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/forward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/iter_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss_att.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/loss_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/optim0_lr0.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/optim_step_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/train_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/images/wer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9af14a1f7721767a237734f181eb69c9461fc66fe4671f835e6a1c4381d0ca08
3
+ size 139014413
exp/lm_train_lm_transformer_en_char/config.yaml ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_lm_transformer.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/lm_train_lm_transformer_en_char
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 44469
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 25
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 2
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 350000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/lm_stats_en_char/train/text_shape.char
72
+ valid_shape_file:
73
+ - exp/lm_stats_en_char/valid/text_shape.char
74
+ batch_type: numel
75
+ valid_batch_type: null
76
+ fold_length:
77
+ - 150
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - dump/raw/lm_train.txt
86
+ - text
87
+ - text
88
+ valid_data_path_and_name_and_type:
89
+ - - dump/raw/test_dev93/text
90
+ - text
91
+ - text
92
+ allow_variable_data_keys: false
93
+ max_cache_size: 0.0
94
+ max_cache_fd: 32
95
+ valid_max_cache_size: null
96
+ optim: adam
97
+ optim_conf:
98
+ lr: 0.001
99
+ scheduler: warmuplr
100
+ scheduler_conf:
101
+ warmup_steps: 25000
102
+ token_list:
103
+ - <blank>
104
+ - <unk>
105
+ - <space>
106
+ - E
107
+ - T
108
+ - A
109
+ - N
110
+ - I
111
+ - O
112
+ - S
113
+ - R
114
+ - H
115
+ - L
116
+ - D
117
+ - C
118
+ - U
119
+ - M
120
+ - P
121
+ - F
122
+ - G
123
+ - Y
124
+ - W
125
+ - B
126
+ - V
127
+ - K
128
+ - .
129
+ - X
130
+ - ''''
131
+ - J
132
+ - Q
133
+ - Z
134
+ - <NOISE>
135
+ - ','
136
+ - '-'
137
+ - '"'
138
+ - '*'
139
+ - ':'
140
+ - (
141
+ - )
142
+ - '?'
143
+ - '!'
144
+ - '&'
145
+ - ;
146
+ - '1'
147
+ - '2'
148
+ - '0'
149
+ - /
150
+ - $
151
+ - '{'
152
+ - '}'
153
+ - '8'
154
+ - '9'
155
+ - '6'
156
+ - '3'
157
+ - '5'
158
+ - '7'
159
+ - '4'
160
+ - '~'
161
+ - '`'
162
+ - _
163
+ - <*IN*>
164
+ - <*MR.*>
165
+ - \
166
+ - ^
167
+ - <sos/eos>
168
+ init: null
169
+ model_conf:
170
+ ignore_id: 0
171
+ use_preprocessor: true
172
+ token_type: char
173
+ bpemodel: null
174
+ non_linguistic_symbols: data/nlsyms.txt
175
+ cleaner: null
176
+ g2p: null
177
+ lm: transformer
178
+ lm_conf:
179
+ pos_enc: null
180
+ embed_unit: 128
181
+ att_unit: 512
182
+ head: 8
183
+ unit: 2048
184
+ layer: 16
185
+ dropout_rate: 0.1
186
+ required:
187
+ - output_dir
188
+ - token_list
189
+ version: '202211'
190
+ distributed: true
exp/lm_train_lm_transformer_en_char/images/backward_time.png ADDED
exp/lm_train_lm_transformer_en_char/images/forward_time.png ADDED
exp/lm_train_lm_transformer_en_char/images/gpu_max_cached_mem_GB.png ADDED
exp/lm_train_lm_transformer_en_char/images/iter_time.png ADDED
exp/lm_train_lm_transformer_en_char/images/loss.png ADDED
exp/lm_train_lm_transformer_en_char/images/optim0_lr0.png ADDED
exp/lm_train_lm_transformer_en_char/images/optim_step_time.png ADDED
exp/lm_train_lm_transformer_en_char/images/train_time.png ADDED
exp/lm_train_lm_transformer_en_char/perplexity_test/ppl ADDED
@@ -0,0 +1 @@
 
 
1
+ 2.2880849662126233
exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e700b5a85868956df7aae5581f76e3a66115d1bdf2ee031b51454dc21a7010db
3
+ size 202290031
meta.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ espnet: '202211'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/valid.acc.ave_10best.pth
4
+ lm_file: exp/lm_train_lm_transformer_en_char/valid.loss.ave_10best.pth
5
+ python: "3.9.15 (main, Nov 24 2022, 14:31:59) \n[GCC 11.2.0]"
6
+ timestamp: 1672204365.481538
7
+ torch: 1.12.1
8
+ yaml_files:
9
+ asr_train_config: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_en_char/config.yaml
10
+ lm_train_config: exp/lm_train_lm_transformer_en_char/config.yaml