Jiyang Tang commited on
Commit
9db65ab
1 Parent(s): 50cc396

Update model

Browse files
README.md ADDED
@@ -0,0 +1,385 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - A
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/jiyang_tang_aphsiabank_english_asr_ebranchformer_wavlm_aph_en_both`
15
+
16
+ This model was trained by Jiyang Tang using A recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout edf949f535938da8c705c1d26cc561b2d4cb4778
26
+ pip install -e .
27
+ cd jtang1/A/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/jiyang_tang_aphsiabank_english_asr_ebranchformer_wavlm_aph_en_both
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Tue Mar 7 12:06:32 EST 2023`
35
+ - python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]`
36
+ - espnet version: `espnet 202301`
37
+ - pytorch version: `pytorch 1.8.1`
38
+ - Git hash: `b0b2a0aa9c335267046e83036b87e88af30698da`
39
+ - Commit date: `Tue Feb 7 14:56:31 2023 -0500`
40
+
41
+ ## asr_ebranchformer_wavlm_aph_en_both
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_model_valid.acc.ave/test|28424|296887|83.0|12.4|4.6|2.7|19.6|71.1|
47
+
48
+ ### CER
49
+
50
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
51
+ |---|---|---|---|---|---|---|---|---|
52
+ |decode_asr_model_valid.acc.ave/test|28424|1507391|91.9|3.0|5.1|3.0|11.1|71.1|
53
+
54
+ ### TER
55
+
56
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
57
+ |---|---|---|---|---|---|---|---|---|
58
+
59
+ ## ASR config
60
+
61
+ <details><summary>expand</summary>
62
+
63
+ ```
64
+ config: conf/tuning/train_asr_ebranchformer_small_wavlm_large1.yaml
65
+ print_config: false
66
+ log_level: INFO
67
+ dry_run: false
68
+ iterator_type: sequence
69
+ output_dir: exp/asr_ebranchformer_wavlm_aph_en_both
70
+ ngpu: 1
71
+ seed: 2022
72
+ num_workers: 4
73
+ num_att_plot: 0
74
+ dist_backend: nccl
75
+ dist_init_method: env://
76
+ dist_world_size: 2
77
+ dist_rank: 0
78
+ local_rank: 0
79
+ dist_master_addr: localhost
80
+ dist_master_port: 44175
81
+ dist_launcher: null
82
+ multiprocessing_distributed: true
83
+ unused_parameters: true
84
+ sharded_ddp: false
85
+ cudnn_enabled: true
86
+ cudnn_benchmark: false
87
+ cudnn_deterministic: false
88
+ collect_stats: false
89
+ write_collected_feats: false
90
+ max_epoch: 30
91
+ patience: null
92
+ val_scheduler_criterion:
93
+ - valid
94
+ - loss
95
+ early_stopping_criterion:
96
+ - valid
97
+ - loss
98
+ - min
99
+ best_model_criterion:
100
+ - - valid
101
+ - acc
102
+ - max
103
+ keep_nbest_models: 10
104
+ nbest_averaging_interval: 0
105
+ grad_clip: 5
106
+ grad_clip_type: 2.0
107
+ grad_noise: false
108
+ accum_grad: 8
109
+ no_forward_run: false
110
+ resume: true
111
+ train_dtype: float32
112
+ use_amp: true
113
+ log_interval: 200
114
+ use_matplotlib: true
115
+ use_tensorboard: true
116
+ create_graph_in_tensorboard: false
117
+ use_wandb: false
118
+ wandb_project: null
119
+ wandb_id: null
120
+ wandb_entity: null
121
+ wandb_name: null
122
+ wandb_model_log_interval: -1
123
+ detect_anomaly: false
124
+ pretrain_path: null
125
+ init_param: []
126
+ ignore_init_mismatch: false
127
+ freeze_param:
128
+ - frontend.upstream
129
+ num_iters_per_epoch: null
130
+ batch_size: 20
131
+ valid_batch_size: null
132
+ batch_bins: 6000000
133
+ valid_batch_bins: null
134
+ train_shape_file:
135
+ - exp/asr_stats_raw_en_char_sp/train/speech_shape
136
+ - exp/asr_stats_raw_en_char_sp/train/text_shape.char
137
+ valid_shape_file:
138
+ - exp/asr_stats_raw_en_char_sp/valid/speech_shape
139
+ - exp/asr_stats_raw_en_char_sp/valid/text_shape.char
140
+ batch_type: numel
141
+ valid_batch_type: null
142
+ fold_length:
143
+ - 80000
144
+ - 150
145
+ sort_in_batch: descending
146
+ sort_batch: descending
147
+ multiple_iterator: false
148
+ chunk_length: 500
149
+ chunk_shift_ratio: 0.5
150
+ num_cache_chunks: 1024
151
+ train_data_path_and_name_and_type:
152
+ - - dump/raw/train_sp/wav.scp
153
+ - speech
154
+ - sound
155
+ - - dump/raw/train_sp/text
156
+ - text
157
+ - text
158
+ valid_data_path_and_name_and_type:
159
+ - - dump/raw/val/wav.scp
160
+ - speech
161
+ - sound
162
+ - - dump/raw/val/text
163
+ - text
164
+ - text
165
+ allow_variable_data_keys: false
166
+ max_cache_size: 0.0
167
+ max_cache_fd: 32
168
+ valid_max_cache_size: null
169
+ exclude_weight_decay: false
170
+ exclude_weight_decay_conf: {}
171
+ optim: adam
172
+ optim_conf:
173
+ lr: 0.001
174
+ weight_decay: 1.0e-06
175
+ scheduler: warmuplr
176
+ scheduler_conf:
177
+ warmup_steps: 2500
178
+ token_list:
179
+ - <blank>
180
+ - <unk>
181
+ - '[APH]'
182
+ - '[NONAPH]'
183
+ - <space>
184
+ - e
185
+ - t
186
+ - a
187
+ - h
188
+ - o
189
+ - A
190
+ - n
191
+ - '['
192
+ - P
193
+ - H
194
+ - ']'
195
+ - i
196
+ - s
197
+ - N
198
+ - d
199
+ - r
200
+ - u
201
+ - l
202
+ - m
203
+ - w
204
+ - O
205
+ - y
206
+ - g
207
+ - c
208
+ - b
209
+ - f
210
+ - p
211
+ - k
212
+ - ''''
213
+ - v
214
+ - j
215
+ - <
216
+ - L
217
+ - U
218
+ - '>'
219
+ - ɪ
220
+ - x
221
+ - ə
222
+ - z
223
+ - ɛ
224
+ - ɑ
225
+ - q
226
+ - ɹ
227
+ - æ
228
+ - ˞
229
+ - ʌ
230
+ - ʃ
231
+ - ʊ
232
+ - ɔ
233
+ - ŋ
234
+ - ɚ
235
+ - ɾ
236
+ - ʒ
237
+ - ð
238
+ - θ
239
+ - ɜ
240
+ - ɝ
241
+ - ɡ
242
+ - '0'
243
+ - ː
244
+ - ʔ
245
+ - ɒ
246
+ - é
247
+ - ɸ
248
+ - ̩
249
+ - ʤ
250
+ - ʧ
251
+ - <sos/eos>
252
+ init: null
253
+ input_size: null
254
+ ctc_conf:
255
+ dropout_rate: 0.0
256
+ ctc_type: builtin
257
+ reduce: true
258
+ ignore_nan_grad: null
259
+ zero_infinity: true
260
+ joint_net_conf: null
261
+ use_preprocessor: true
262
+ token_type: char
263
+ bpemodel: null
264
+ non_linguistic_symbols: local/nlsyms.txt
265
+ cleaner: null
266
+ g2p: null
267
+ speech_volume_normalize: null
268
+ rir_scp: null
269
+ rir_apply_prob: 1.0
270
+ noise_scp: null
271
+ noise_apply_prob: 1.0
272
+ noise_db_range: '13_15'
273
+ short_noise_thres: 0.5
274
+ aux_ctc_tasks: []
275
+ frontend: s3prl
276
+ frontend_conf:
277
+ frontend_conf:
278
+ upstream: wavlm_large
279
+ download_dir: ./hub
280
+ multilayer_feature: true
281
+ fs: 16k
282
+ specaug: specaug
283
+ specaug_conf:
284
+ apply_time_warp: true
285
+ time_warp_window: 5
286
+ time_warp_mode: bicubic
287
+ apply_freq_mask: true
288
+ freq_mask_width_range:
289
+ - 0
290
+ - 27
291
+ num_freq_mask: 2
292
+ apply_time_mask: true
293
+ time_mask_width_ratio_range:
294
+ - 0.0
295
+ - 0.05
296
+ num_time_mask: 5
297
+ normalize: utterance_mvn
298
+ normalize_conf: {}
299
+ model: espnet
300
+ model_conf:
301
+ ctc_weight: 0.3
302
+ lsm_weight: 0.1
303
+ length_normalized_loss: false
304
+ extract_feats_in_collect_stats: false
305
+ preencoder: linear
306
+ preencoder_conf:
307
+ input_size: 1024
308
+ output_size: 80
309
+ encoder: e_branchformer
310
+ encoder_conf:
311
+ output_size: 256
312
+ attention_heads: 4
313
+ linear_units: 1024
314
+ num_blocks: 12
315
+ dropout_rate: 0.1
316
+ positional_dropout_rate: 0.1
317
+ attention_dropout_rate: 0.1
318
+ layer_drop_rate: 0.1
319
+ input_layer: conv2d1
320
+ macaron_ffn: true
321
+ pos_enc_layer_type: rel_pos
322
+ attention_layer_type: rel_selfattn
323
+ rel_pos_type: latest
324
+ cgmlp_linear_units: 3072
325
+ cgmlp_conv_kernel: 31
326
+ use_linear_after_conv: false
327
+ gate_activation: identity
328
+ positionwise_layer_type: linear
329
+ use_ffn: true
330
+ merge_conv_kernel: 31
331
+ postencoder: null
332
+ postencoder_conf: {}
333
+ decoder: transformer
334
+ decoder_conf:
335
+ attention_heads: 4
336
+ linear_units: 2048
337
+ num_blocks: 6
338
+ dropout_rate: 0.1
339
+ positional_dropout_rate: 0.1
340
+ self_attention_dropout_rate: 0.1
341
+ src_attention_dropout_rate: 0.1
342
+ layer_drop_rate: 0.2
343
+ preprocessor: default
344
+ preprocessor_conf: {}
345
+ required:
346
+ - output_dir
347
+ - token_list
348
+ version: '202301'
349
+ distributed: true
350
+ ```
351
+
352
+ </details>
353
+
354
+
355
+
356
+ ### Citing ESPnet
357
+
358
+ ```BibTex
359
+ @inproceedings{watanabe2018espnet,
360
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
361
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
362
+ year={2018},
363
+ booktitle={Proceedings of Interspeech},
364
+ pages={2207--2211},
365
+ doi={10.21437/Interspeech.2018-1456},
366
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
367
+ }
368
+
369
+
370
+
371
+
372
+ ```
373
+
374
+ or arXiv:
375
+
376
+ ```bibtex
377
+ @misc{watanabe2018espnet,
378
+ title={ESPnet: End-to-End Speech Processing Toolkit},
379
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
380
+ year={2018},
381
+ eprint={1804.00015},
382
+ archivePrefix={arXiv},
383
+ primaryClass={cs.CL}
384
+ }
385
+ ```
exp/asr_ebranchformer_wavlm_aph_en_both/RESULTS.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Mar 7 12:06:32 EST 2023`
5
+ - python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]`
6
+ - espnet version: `espnet 202301`
7
+ - pytorch version: `pytorch 1.8.1`
8
+ - Git hash: `b0b2a0aa9c335267046e83036b87e88af30698da`
9
+ - Commit date: `Tue Feb 7 14:56:31 2023 -0500`
10
+
11
+ ## asr_ebranchformer_wavlm_aph_en_both
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_model_valid.acc.ave/test|28424|296887|83.0|12.4|4.6|2.7|19.6|71.1|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_asr_model_valid.acc.ave/test|28424|1507391|91.9|3.0|5.1|3.0|11.1|71.1|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
exp/asr_ebranchformer_wavlm_aph_en_both/config.yaml ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_ebranchformer_small_wavlm_large1.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_ebranchformer_wavlm_aph_en_both
7
+ ngpu: 1
8
+ seed: 2022
9
+ num_workers: 4
10
+ num_att_plot: 0
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 44175
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: false
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 30
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 8
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: true
50
+ log_interval: 200
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param:
65
+ - frontend.upstream
66
+ num_iters_per_epoch: null
67
+ batch_size: 20
68
+ valid_batch_size: null
69
+ batch_bins: 6000000
70
+ valid_batch_bins: null
71
+ train_shape_file:
72
+ - exp/asr_stats_raw_en_char_sp/train/speech_shape
73
+ - exp/asr_stats_raw_en_char_sp/train/text_shape.char
74
+ valid_shape_file:
75
+ - exp/asr_stats_raw_en_char_sp/valid/speech_shape
76
+ - exp/asr_stats_raw_en_char_sp/valid/text_shape.char
77
+ batch_type: numel
78
+ valid_batch_type: null
79
+ fold_length:
80
+ - 80000
81
+ - 150
82
+ sort_in_batch: descending
83
+ sort_batch: descending
84
+ multiple_iterator: false
85
+ chunk_length: 500
86
+ chunk_shift_ratio: 0.5
87
+ num_cache_chunks: 1024
88
+ train_data_path_and_name_and_type:
89
+ - - dump/raw/train_sp/wav.scp
90
+ - speech
91
+ - sound
92
+ - - dump/raw/train_sp/text
93
+ - text
94
+ - text
95
+ valid_data_path_and_name_and_type:
96
+ - - dump/raw/val/wav.scp
97
+ - speech
98
+ - sound
99
+ - - dump/raw/val/text
100
+ - text
101
+ - text
102
+ allow_variable_data_keys: false
103
+ max_cache_size: 0.0
104
+ max_cache_fd: 32
105
+ valid_max_cache_size: null
106
+ exclude_weight_decay: false
107
+ exclude_weight_decay_conf: {}
108
+ optim: adam
109
+ optim_conf:
110
+ lr: 0.001
111
+ weight_decay: 1.0e-06
112
+ scheduler: warmuplr
113
+ scheduler_conf:
114
+ warmup_steps: 2500
115
+ token_list:
116
+ - <blank>
117
+ - <unk>
118
+ - '[APH]'
119
+ - '[NONAPH]'
120
+ - <space>
121
+ - e
122
+ - t
123
+ - a
124
+ - h
125
+ - o
126
+ - A
127
+ - n
128
+ - '['
129
+ - P
130
+ - H
131
+ - ']'
132
+ - i
133
+ - s
134
+ - N
135
+ - d
136
+ - r
137
+ - u
138
+ - l
139
+ - m
140
+ - w
141
+ - O
142
+ - y
143
+ - g
144
+ - c
145
+ - b
146
+ - f
147
+ - p
148
+ - k
149
+ - ''''
150
+ - v
151
+ - j
152
+ - <
153
+ - L
154
+ - U
155
+ - '>'
156
+ - ɪ
157
+ - x
158
+ - ə
159
+ - z
160
+ - ɛ
161
+ - ɑ
162
+ - q
163
+ - ɹ
164
+ - æ
165
+ - ˞
166
+ - ʌ
167
+ - ʃ
168
+ - ʊ
169
+ - ɔ
170
+ - ŋ
171
+ - ɚ
172
+ - ɾ
173
+ - ʒ
174
+ - ð
175
+ - θ
176
+ - ɜ
177
+ - ɝ
178
+ - ɡ
179
+ - '0'
180
+ - ː
181
+ - ʔ
182
+ - ɒ
183
+ - é
184
+ - ɸ
185
+ - ̩
186
+ - ʤ
187
+ - ʧ
188
+ - <sos/eos>
189
+ init: null
190
+ input_size: null
191
+ ctc_conf:
192
+ dropout_rate: 0.0
193
+ ctc_type: builtin
194
+ reduce: true
195
+ ignore_nan_grad: null
196
+ zero_infinity: true
197
+ joint_net_conf: null
198
+ use_preprocessor: true
199
+ token_type: char
200
+ bpemodel: null
201
+ non_linguistic_symbols: local/nlsyms.txt
202
+ cleaner: null
203
+ g2p: null
204
+ speech_volume_normalize: null
205
+ rir_scp: null
206
+ rir_apply_prob: 1.0
207
+ noise_scp: null
208
+ noise_apply_prob: 1.0
209
+ noise_db_range: '13_15'
210
+ short_noise_thres: 0.5
211
+ aux_ctc_tasks: []
212
+ frontend: s3prl
213
+ frontend_conf:
214
+ frontend_conf:
215
+ upstream: wavlm_large
216
+ download_dir: ./hub
217
+ multilayer_feature: true
218
+ fs: 16k
219
+ specaug: specaug
220
+ specaug_conf:
221
+ apply_time_warp: true
222
+ time_warp_window: 5
223
+ time_warp_mode: bicubic
224
+ apply_freq_mask: true
225
+ freq_mask_width_range:
226
+ - 0
227
+ - 27
228
+ num_freq_mask: 2
229
+ apply_time_mask: true
230
+ time_mask_width_ratio_range:
231
+ - 0.0
232
+ - 0.05
233
+ num_time_mask: 5
234
+ normalize: utterance_mvn
235
+ normalize_conf: {}
236
+ model: espnet
237
+ model_conf:
238
+ ctc_weight: 0.3
239
+ lsm_weight: 0.1
240
+ length_normalized_loss: false
241
+ extract_feats_in_collect_stats: false
242
+ preencoder: linear
243
+ preencoder_conf:
244
+ input_size: 1024
245
+ output_size: 80
246
+ encoder: e_branchformer
247
+ encoder_conf:
248
+ output_size: 256
249
+ attention_heads: 4
250
+ linear_units: 1024
251
+ num_blocks: 12
252
+ dropout_rate: 0.1
253
+ positional_dropout_rate: 0.1
254
+ attention_dropout_rate: 0.1
255
+ layer_drop_rate: 0.1
256
+ input_layer: conv2d1
257
+ macaron_ffn: true
258
+ pos_enc_layer_type: rel_pos
259
+ attention_layer_type: rel_selfattn
260
+ rel_pos_type: latest
261
+ cgmlp_linear_units: 3072
262
+ cgmlp_conv_kernel: 31
263
+ use_linear_after_conv: false
264
+ gate_activation: identity
265
+ positionwise_layer_type: linear
266
+ use_ffn: true
267
+ merge_conv_kernel: 31
268
+ postencoder: null
269
+ postencoder_conf: {}
270
+ decoder: transformer
271
+ decoder_conf:
272
+ attention_heads: 4
273
+ linear_units: 2048
274
+ num_blocks: 6
275
+ dropout_rate: 0.1
276
+ positional_dropout_rate: 0.1
277
+ self_attention_dropout_rate: 0.1
278
+ src_attention_dropout_rate: 0.1
279
+ layer_drop_rate: 0.2
280
+ preprocessor: default
281
+ preprocessor_conf: {}
282
+ required:
283
+ - output_dir
284
+ - token_list
285
+ version: '202301'
286
+ distributed: true
exp/asr_ebranchformer_wavlm_aph_en_both/images/acc.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/backward_time.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/cer.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/cer_ctc.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/forward_time.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/iter_time.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/loss.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/loss_att.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/loss_ctc.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/optim0_lr0.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/optim_step_time.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/train_time.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/images/wer.png ADDED
exp/asr_ebranchformer_wavlm_aph_en_both/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cac500148ed6b9f9b0be80e21e56dc57b285b58ce577787d2a3b8daccbaee0dc
3
+ size 1455789953
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202301'
2
+ files:
3
+ asr_model_file: exp/asr_ebranchformer_wavlm_aph_en_both/valid.acc.ave_10best.pth
4
+ python: "3.9.12 (main, Apr 5 2022, 06:56:58) \n[GCC 7.5.0]"
5
+ timestamp: 1678208795.72663
6
+ torch: 1.8.1
7
+ yaml_files:
8
+ asr_train_config: exp/asr_ebranchformer_wavlm_aph_en_both/config.yaml