pyf98 commited on
Commit
fc00281
1 Parent(s): a900390

add model files

Browse files
Files changed (20) hide show
  1. README.md +332 -0
  2. exp/asr_stats_raw_it_char/train/feats_stats.npz +3 -0
  3. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/RESULTS.md +29 -0
  4. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/config.yaml +227 -0
  5. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/acc.png +0 -0
  6. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/backward_time.png +0 -0
  7. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/cer.png +0 -0
  8. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/cer_ctc.png +0 -0
  9. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/forward_time.png +0 -0
  10. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/gpu_max_cached_mem_GB.png +0 -0
  11. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/iter_time.png +0 -0
  12. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss.png +0 -0
  13. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss_att.png +0 -0
  14. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss_ctc.png +0 -0
  15. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/optim0_lr0.png +0 -0
  16. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/optim_step_time.png +0 -0
  17. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/train_time.png +0 -0
  18. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/wer.png +0 -0
  19. exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/valid.acc.ave_10best.pth +3 -0
  20. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: it
7
+ datasets:
8
+ - voxforge
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `pyf98/voxforge_it_e_branchformer`
15
+
16
+ This model was trained by Yifan Peng using voxforge recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ References:
19
+ - [E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022)](https://arxiv.org/abs/2210.00077)
20
+ - [Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022)](https://proceedings.mlr.press/v162/peng22a.html)
21
+
22
+ ### Demo: How to use in ESPnet2
23
+
24
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
25
+ if you haven't done that already.
26
+
27
+ ```bash
28
+ cd espnet
29
+ git checkout bf8c8f00194bdfed8ca388d8b20d14791b7d270e
30
+ pip install -e .
31
+ cd egs2/voxforge/asr1
32
+ ./run.sh --skip_data_prep false --skip_train true --download_model pyf98/voxforge_it_e_branchformer
33
+ ```
34
+
35
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
36
+ # RESULTS
37
+ ## Environments
38
+ - date: `Thu Dec 29 01:48:29 EST 2022`
39
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
40
+ - espnet version: `espnet 202211`
41
+ - pytorch version: `pytorch 1.12.1`
42
+ - Git hash: `bf8c8f00194bdfed8ca388d8b20d14791b7d270e`
43
+ - Commit date: `Wed Dec 28 22:43:13 2022 -0500`
44
+
45
+ ## asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse
46
+ ### WER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_asr_asr_model_valid.acc.ave/dt_it|1035|12587|71.3|24.0|4.7|3.8|32.5|95.5|
51
+ |decode_asr_asr_model_valid.acc.ave/et_it|1103|13699|72.8|22.7|4.5|3.1|30.2|91.7|
52
+
53
+ ### CER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |decode_asr_asr_model_valid.acc.ave/dt_it|1035|75494|93.2|3.7|3.1|2.0|8.8|95.5|
58
+ |decode_asr_asr_model_valid.acc.ave/et_it|1103|81228|93.8|3.5|2.7|1.8|8.0|91.7|
59
+
60
+ ### TER
61
+
62
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
63
+ |---|---|---|---|---|---|---|---|---|
64
+
65
+ ## ASR config
66
+
67
+ <details><summary>expand</summary>
68
+
69
+ ```
70
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp1024_linear1024.yaml
71
+ print_config: false
72
+ log_level: INFO
73
+ dry_run: false
74
+ iterator_type: sequence
75
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse
76
+ ngpu: 1
77
+ seed: 0
78
+ num_workers: 4
79
+ num_att_plot: 3
80
+ dist_backend: nccl
81
+ dist_init_method: env://
82
+ dist_world_size: null
83
+ dist_rank: null
84
+ local_rank: 0
85
+ dist_master_addr: null
86
+ dist_master_port: null
87
+ dist_launcher: null
88
+ multiprocessing_distributed: false
89
+ unused_parameters: false
90
+ sharded_ddp: false
91
+ cudnn_enabled: true
92
+ cudnn_benchmark: false
93
+ cudnn_deterministic: true
94
+ collect_stats: false
95
+ write_collected_feats: false
96
+ max_epoch: 100
97
+ patience: null
98
+ val_scheduler_criterion:
99
+ - valid
100
+ - loss
101
+ early_stopping_criterion:
102
+ - valid
103
+ - loss
104
+ - min
105
+ best_model_criterion:
106
+ - - valid
107
+ - acc
108
+ - max
109
+ keep_nbest_models: 10
110
+ nbest_averaging_interval: 0
111
+ grad_clip: 5.0
112
+ grad_clip_type: 2.0
113
+ grad_noise: false
114
+ accum_grad: 1
115
+ no_forward_run: false
116
+ resume: true
117
+ train_dtype: float32
118
+ use_amp: true
119
+ log_interval: null
120
+ use_matplotlib: true
121
+ use_tensorboard: true
122
+ create_graph_in_tensorboard: false
123
+ use_wandb: false
124
+ wandb_project: null
125
+ wandb_id: null
126
+ wandb_entity: null
127
+ wandb_name: null
128
+ wandb_model_log_interval: -1
129
+ detect_anomaly: false
130
+ pretrain_path: null
131
+ init_param: []
132
+ ignore_init_mismatch: false
133
+ freeze_param: []
134
+ num_iters_per_epoch: null
135
+ batch_size: 128
136
+ valid_batch_size: null
137
+ batch_bins: 1000000
138
+ valid_batch_bins: null
139
+ train_shape_file:
140
+ - exp/asr_stats_raw_it_char/train/speech_shape
141
+ - exp/asr_stats_raw_it_char/train/text_shape.char
142
+ valid_shape_file:
143
+ - exp/asr_stats_raw_it_char/valid/speech_shape
144
+ - exp/asr_stats_raw_it_char/valid/text_shape.char
145
+ batch_type: folded
146
+ valid_batch_type: null
147
+ fold_length:
148
+ - 80000
149
+ - 150
150
+ sort_in_batch: descending
151
+ sort_batch: descending
152
+ multiple_iterator: false
153
+ chunk_length: 500
154
+ chunk_shift_ratio: 0.5
155
+ num_cache_chunks: 1024
156
+ train_data_path_and_name_and_type:
157
+ - - dump/raw/tr_it/wav.scp
158
+ - speech
159
+ - sound
160
+ - - dump/raw/tr_it/text
161
+ - text
162
+ - text
163
+ valid_data_path_and_name_and_type:
164
+ - - dump/raw/dt_it/wav.scp
165
+ - speech
166
+ - sound
167
+ - - dump/raw/dt_it/text
168
+ - text
169
+ - text
170
+ allow_variable_data_keys: false
171
+ max_cache_size: 0.0
172
+ max_cache_fd: 32
173
+ valid_max_cache_size: null
174
+ optim: adam
175
+ optim_conf:
176
+ lr: 0.002
177
+ scheduler: warmuplr
178
+ scheduler_conf:
179
+ warmup_steps: 10000
180
+ token_list:
181
+ - <blank>
182
+ - <unk>
183
+ - <space>
184
+ - A
185
+ - E
186
+ - I
187
+ - O
188
+ - R
189
+ - N
190
+ - L
191
+ - S
192
+ - T
193
+ - C
194
+ - D
195
+ - U
196
+ - M
197
+ - P
198
+ - V
199
+ - G
200
+ - F
201
+ - H
202
+ - B
203
+ - Q
204
+ - Z
205
+ - ''''
206
+ - Ò
207
+ - À
208
+ - È
209
+ - Ú
210
+ - X
211
+ - W
212
+ - Í
213
+ - É
214
+ - Y
215
+ - K
216
+ - J
217
+ - '1'
218
+ - <sos/eos>
219
+ init: null
220
+ input_size: null
221
+ ctc_conf:
222
+ dropout_rate: 0.0
223
+ ctc_type: builtin
224
+ reduce: true
225
+ ignore_nan_grad: null
226
+ zero_infinity: true
227
+ joint_net_conf: null
228
+ use_preprocessor: true
229
+ token_type: char
230
+ bpemodel: null
231
+ non_linguistic_symbols: null
232
+ cleaner: null
233
+ g2p: null
234
+ speech_volume_normalize: null
235
+ rir_scp: null
236
+ rir_apply_prob: 1.0
237
+ noise_scp: null
238
+ noise_apply_prob: 1.0
239
+ noise_db_range: '13_15'
240
+ short_noise_thres: 0.5
241
+ frontend: default
242
+ frontend_conf:
243
+ fs: 16k
244
+ specaug: null
245
+ specaug_conf: {}
246
+ normalize: global_mvn
247
+ normalize_conf:
248
+ stats_file: exp/asr_stats_raw_it_char/train/feats_stats.npz
249
+ norm_vars: false
250
+ model: espnet
251
+ model_conf:
252
+ ctc_weight: 0.3
253
+ lsm_weight: 0.1
254
+ length_normalized_loss: false
255
+ preencoder: null
256
+ preencoder_conf: {}
257
+ encoder: e_branchformer
258
+ encoder_conf:
259
+ output_size: 256
260
+ attention_heads: 4
261
+ attention_layer_type: rel_selfattn
262
+ pos_enc_layer_type: rel_pos
263
+ rel_pos_type: latest
264
+ cgmlp_linear_units: 1024
265
+ cgmlp_conv_kernel: 31
266
+ use_linear_after_conv: false
267
+ gate_activation: identity
268
+ num_blocks: 12
269
+ dropout_rate: 0.1
270
+ positional_dropout_rate: 0.1
271
+ attention_dropout_rate: 0.1
272
+ input_layer: conv2d
273
+ layer_drop_rate: 0.0
274
+ linear_units: 1024
275
+ positionwise_layer_type: linear
276
+ use_ffn: true
277
+ macaron_ffn: true
278
+ merge_conv_kernel: 31
279
+ postencoder: null
280
+ postencoder_conf: {}
281
+ decoder: transformer
282
+ decoder_conf:
283
+ attention_heads: 4
284
+ linear_units: 2048
285
+ num_blocks: 6
286
+ dropout_rate: 0.1
287
+ positional_dropout_rate: 0.1
288
+ self_attention_dropout_rate: 0.0
289
+ src_attention_dropout_rate: 0.0
290
+ preprocessor: default
291
+ preprocessor_conf: {}
292
+ required:
293
+ - output_dir
294
+ - token_list
295
+ version: '202211'
296
+ distributed: false
297
+ ```
298
+
299
+ </details>
300
+
301
+
302
+
303
+ ### Citing ESPnet
304
+
305
+ ```BibTex
306
+ @inproceedings{watanabe2018espnet,
307
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
308
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
309
+ year={2018},
310
+ booktitle={Proceedings of Interspeech},
311
+ pages={2207--2211},
312
+ doi={10.21437/Interspeech.2018-1456},
313
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
314
+ }
315
+
316
+
317
+
318
+
319
+ ```
320
+
321
+ or arXiv:
322
+
323
+ ```bibtex
324
+ @misc{watanabe2018espnet,
325
+ title={ESPnet: End-to-End Speech Processing Toolkit},
326
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
327
+ year={2018},
328
+ eprint={1804.00015},
329
+ archivePrefix={arXiv},
330
+ primaryClass={cs.CL}
331
+ }
332
+ ```
exp/asr_stats_raw_it_char/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ac4c6cee9a2bbf0175bb92bc350e21038c874b9d01b537b2688f5c8e1bae3d6
3
+ size 1402
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Thu Dec 29 01:48:29 EST 2022`
5
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202211`
7
+ - pytorch version: `pytorch 1.12.1`
8
+ - Git hash: `bf8c8f00194bdfed8ca388d8b20d14791b7d270e`
9
+ - Commit date: `Wed Dec 28 22:43:13 2022 -0500`
10
+
11
+ ## asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave/dt_it|1035|12587|71.3|24.0|4.7|3.8|32.5|95.5|
17
+ |decode_asr_asr_model_valid.acc.ave/et_it|1103|13699|72.8|22.7|4.5|3.1|30.2|91.7|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_asr_model_valid.acc.ave/dt_it|1035|75494|93.2|3.7|3.1|2.0|8.8|95.5|
24
+ |decode_asr_asr_model_valid.acc.ave/et_it|1103|81228|93.8|3.5|2.7|1.8|8.0|91.7|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/config.yaml ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp1024_linear1024.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: true
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 128
67
+ valid_batch_size: null
68
+ batch_bins: 1000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_it_char/train/speech_shape
72
+ - exp/asr_stats_raw_it_char/train/text_shape.char
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_it_char/valid/speech_shape
75
+ - exp/asr_stats_raw_it_char/valid/text_shape.char
76
+ batch_type: folded
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/tr_it/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/raw/tr_it/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/dt_it/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/dt_it/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.002
108
+ scheduler: warmuplr
109
+ scheduler_conf:
110
+ warmup_steps: 10000
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - <space>
115
+ - A
116
+ - E
117
+ - I
118
+ - O
119
+ - R
120
+ - N
121
+ - L
122
+ - S
123
+ - T
124
+ - C
125
+ - D
126
+ - U
127
+ - M
128
+ - P
129
+ - V
130
+ - G
131
+ - F
132
+ - H
133
+ - B
134
+ - Q
135
+ - Z
136
+ - ''''
137
+ - Ò
138
+ - À
139
+ - È
140
+ - Ú
141
+ - X
142
+ - W
143
+ - Í
144
+ - É
145
+ - Y
146
+ - K
147
+ - J
148
+ - '1'
149
+ - <sos/eos>
150
+ init: null
151
+ input_size: null
152
+ ctc_conf:
153
+ dropout_rate: 0.0
154
+ ctc_type: builtin
155
+ reduce: true
156
+ ignore_nan_grad: null
157
+ zero_infinity: true
158
+ joint_net_conf: null
159
+ use_preprocessor: true
160
+ token_type: char
161
+ bpemodel: null
162
+ non_linguistic_symbols: null
163
+ cleaner: null
164
+ g2p: null
165
+ speech_volume_normalize: null
166
+ rir_scp: null
167
+ rir_apply_prob: 1.0
168
+ noise_scp: null
169
+ noise_apply_prob: 1.0
170
+ noise_db_range: '13_15'
171
+ short_noise_thres: 0.5
172
+ frontend: default
173
+ frontend_conf:
174
+ fs: 16k
175
+ specaug: null
176
+ specaug_conf: {}
177
+ normalize: global_mvn
178
+ normalize_conf:
179
+ stats_file: exp/asr_stats_raw_it_char/train/feats_stats.npz
180
+ norm_vars: false
181
+ model: espnet
182
+ model_conf:
183
+ ctc_weight: 0.3
184
+ lsm_weight: 0.1
185
+ length_normalized_loss: false
186
+ preencoder: null
187
+ preencoder_conf: {}
188
+ encoder: e_branchformer
189
+ encoder_conf:
190
+ output_size: 256
191
+ attention_heads: 4
192
+ attention_layer_type: rel_selfattn
193
+ pos_enc_layer_type: rel_pos
194
+ rel_pos_type: latest
195
+ cgmlp_linear_units: 1024
196
+ cgmlp_conv_kernel: 31
197
+ use_linear_after_conv: false
198
+ gate_activation: identity
199
+ num_blocks: 12
200
+ dropout_rate: 0.1
201
+ positional_dropout_rate: 0.1
202
+ attention_dropout_rate: 0.1
203
+ input_layer: conv2d
204
+ layer_drop_rate: 0.0
205
+ linear_units: 1024
206
+ positionwise_layer_type: linear
207
+ use_ffn: true
208
+ macaron_ffn: true
209
+ merge_conv_kernel: 31
210
+ postencoder: null
211
+ postencoder_conf: {}
212
+ decoder: transformer
213
+ decoder_conf:
214
+ attention_heads: 4
215
+ linear_units: 2048
216
+ num_blocks: 6
217
+ dropout_rate: 0.1
218
+ positional_dropout_rate: 0.1
219
+ self_attention_dropout_rate: 0.0
220
+ src_attention_dropout_rate: 0.0
221
+ preprocessor: default
222
+ preprocessor_conf: {}
223
+ required:
224
+ - output_dir
225
+ - token_list
226
+ version: '202211'
227
+ distributed: false
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/acc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/backward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/cer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/cer_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/forward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/iter_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss_att.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/loss_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/optim0_lr0.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/optim_step_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/train_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/images/wer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ab27c2b3cebd3d0ab4762bc63084250bd22543239310e71d41fca00db19b6c0
3
+ size 138931213
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202211'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/valid.acc.ave_10best.pth
4
+ python: "3.9.15 (main, Nov 24 2022, 14:31:59) \n[GCC 11.2.0]"
5
+ timestamp: 1672296511.658743
6
+ torch: 1.12.1
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_e_branchformer_e12_mlp1024_linear1024_raw_it_char_normalize_confnorm_varsFalse/config.yaml