Automatic Speech Recognition
ESPnet
English
audio
pyf98 commited on
Commit
ec92c76
1 Parent(s): 4f19e41

add model files

Browse files
Files changed (21) hide show
  1. README.md +805 -0
  2. data/en_token_list/bpe_unigram500/bpe.model +3 -0
  3. exp/asr_stats_raw_en_bpe500_sp/train/feats_stats.npz +3 -0
  4. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/RESULTS.md +32 -0
  5. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/config.yaml +698 -0
  6. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/acc.png +0 -0
  7. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/backward_time.png +0 -0
  8. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/cer.png +0 -0
  9. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/cer_ctc.png +0 -0
  10. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/forward_time.png +0 -0
  11. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/gpu_max_cached_mem_GB.png +0 -0
  12. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/iter_time.png +0 -0
  13. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss.png +0 -0
  14. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss_att.png +0 -0
  15. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss_ctc.png +0 -0
  16. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/optim0_lr0.png +0 -0
  17. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/optim_step_time.png +0 -0
  18. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/train_time.png +0 -0
  19. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/wer.png +0 -0
  20. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/valid.cer_ctc.ave_10best.pth +3 -0
  21. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,805 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - tedlium2
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `pyf98/tedlium2_ctc_e_branchformer`
15
+
16
+ This model was trained by Yifan Peng using tedlium2 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ References:
19
+ - [E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022)](https://arxiv.org/abs/2210.00077)
20
+ - [Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022)](https://proceedings.mlr.press/v162/peng22a.html)
21
+
22
+ ### Demo: How to use in ESPnet2
23
+
24
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
25
+ if you haven't done that already.
26
+
27
+ ```bash
28
+ cd espnet
29
+ git checkout e62de171f1d11015cb856f83780c61bd5ca7fa8f
30
+ pip install -e .
31
+ cd egs2/tedlium2/asr1
32
+ ./run.sh --skip_data_prep false --skip_train true --download_model pyf98/tedlium2_ctc_e_branchformer
33
+ ```
34
+
35
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
36
+ # RESULTS
37
+ ## Environments
38
+ - date: `Fri Dec 30 20:15:46 CST 2022`
39
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
40
+ - espnet version: `espnet 202211`
41
+ - pytorch version: `pytorch 1.12.1`
42
+ - Git hash: `e62de171f1d11015cb856f83780c61bd5ca7fa8f`
43
+ - Commit date: `Thu Dec 29 14:18:44 2022 -0500`
44
+
45
+ ## asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp
46
+ ### WER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|14671|92.5|5.5|2.0|1.2|8.7|77.3|
51
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|27500|92.7|4.9|2.3|1.1|8.3|70.6|
52
+
53
+ ### CER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|78259|97.2|0.9|1.9|1.2|4.0|77.3|
58
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|145066|97.1|0.9|2.0|1.1|4.0|70.6|
59
+
60
+ ### TER
61
+
62
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
63
+ |---|---|---|---|---|---|---|---|---|
64
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|28296|94.7|3.1|2.2|1.2|6.5|77.3|
65
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|52113|95.0|2.7|2.2|1.1|6.1|70.6|
66
+
67
+ ## ASR config
68
+
69
+ <details><summary>expand</summary>
70
+
71
+ ```
72
+ config: conf/tuning/train_asr_ctc_e_branchformer_e12_mlp1024_linear1024.yaml
73
+ print_config: false
74
+ log_level: INFO
75
+ dry_run: false
76
+ iterator_type: sequence
77
+ output_dir: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp
78
+ ngpu: 1
79
+ seed: 2022
80
+ num_workers: 4
81
+ num_att_plot: 3
82
+ dist_backend: nccl
83
+ dist_init_method: env://
84
+ dist_world_size: 2
85
+ dist_rank: 0
86
+ local_rank: 0
87
+ dist_master_addr: localhost
88
+ dist_master_port: 47545
89
+ dist_launcher: null
90
+ multiprocessing_distributed: true
91
+ unused_parameters: false
92
+ sharded_ddp: false
93
+ cudnn_enabled: true
94
+ cudnn_benchmark: false
95
+ cudnn_deterministic: true
96
+ collect_stats: false
97
+ write_collected_feats: false
98
+ max_epoch: 50
99
+ patience: null
100
+ val_scheduler_criterion:
101
+ - valid
102
+ - loss
103
+ early_stopping_criterion:
104
+ - valid
105
+ - loss
106
+ - min
107
+ best_model_criterion:
108
+ - - valid
109
+ - cer_ctc
110
+ - min
111
+ keep_nbest_models: 10
112
+ nbest_averaging_interval: 0
113
+ grad_clip: 5.0
114
+ grad_clip_type: 2.0
115
+ grad_noise: false
116
+ accum_grad: 1
117
+ no_forward_run: false
118
+ resume: true
119
+ train_dtype: float32
120
+ use_amp: true
121
+ log_interval: null
122
+ use_matplotlib: true
123
+ use_tensorboard: true
124
+ create_graph_in_tensorboard: false
125
+ use_wandb: false
126
+ wandb_project: null
127
+ wandb_id: null
128
+ wandb_entity: null
129
+ wandb_name: null
130
+ wandb_model_log_interval: -1
131
+ detect_anomaly: false
132
+ pretrain_path: null
133
+ init_param: []
134
+ ignore_init_mismatch: false
135
+ freeze_param: []
136
+ num_iters_per_epoch: null
137
+ batch_size: 20
138
+ valid_batch_size: null
139
+ batch_bins: 50000000
140
+ valid_batch_bins: null
141
+ train_shape_file:
142
+ - exp/asr_stats_raw_en_bpe500_sp/train/speech_shape
143
+ - exp/asr_stats_raw_en_bpe500_sp/train/text_shape.bpe
144
+ valid_shape_file:
145
+ - exp/asr_stats_raw_en_bpe500_sp/valid/speech_shape
146
+ - exp/asr_stats_raw_en_bpe500_sp/valid/text_shape.bpe
147
+ batch_type: numel
148
+ valid_batch_type: null
149
+ fold_length:
150
+ - 80000
151
+ - 150
152
+ sort_in_batch: descending
153
+ sort_batch: descending
154
+ multiple_iterator: false
155
+ chunk_length: 500
156
+ chunk_shift_ratio: 0.5
157
+ num_cache_chunks: 1024
158
+ train_data_path_and_name_and_type:
159
+ - - dump/raw/train_sp/wav.scp
160
+ - speech
161
+ - kaldi_ark
162
+ - - dump/raw/train_sp/text
163
+ - text
164
+ - text
165
+ valid_data_path_and_name_and_type:
166
+ - - dump/raw/dev/wav.scp
167
+ - speech
168
+ - kaldi_ark
169
+ - - dump/raw/dev/text
170
+ - text
171
+ - text
172
+ allow_variable_data_keys: false
173
+ max_cache_size: 0.0
174
+ max_cache_fd: 32
175
+ valid_max_cache_size: null
176
+ optim: adam
177
+ optim_conf:
178
+ lr: 0.002
179
+ weight_decay: 1.0e-06
180
+ scheduler: warmuplr
181
+ scheduler_conf:
182
+ warmup_steps: 15000
183
+ token_list:
184
+ - <blank>
185
+ - <unk>
186
+ - s
187
+ - ▁the
188
+ - t
189
+ - ▁a
190
+ - ▁and
191
+ - ▁to
192
+ - d
193
+ - e
194
+ - ▁of
195
+ - ''''
196
+ - n
197
+ - ing
198
+ - ▁in
199
+ - ▁i
200
+ - ▁that
201
+ - i
202
+ - a
203
+ - l
204
+ - p
205
+ - m
206
+ - y
207
+ - o
208
+ - ▁it
209
+ - ▁we
210
+ - c
211
+ - u
212
+ - ▁you
213
+ - ed
214
+ - ▁
215
+ - r
216
+ - ▁is
217
+ - re
218
+ - ▁this
219
+ - ar
220
+ - g
221
+ - ▁so
222
+ - al
223
+ - b
224
+ - ▁s
225
+ - or
226
+ - ▁f
227
+ - ▁c
228
+ - in
229
+ - k
230
+ - f
231
+ - ▁for
232
+ - ic
233
+ - er
234
+ - le
235
+ - ▁be
236
+ - ▁do
237
+ - ▁re
238
+ - ve
239
+ - ▁e
240
+ - ▁w
241
+ - ▁was
242
+ - es
243
+ - ▁they
244
+ - ly
245
+ - h
246
+ - ▁on
247
+ - v
248
+ - ▁are
249
+ - ri
250
+ - ▁have
251
+ - an
252
+ - ▁what
253
+ - ▁with
254
+ - ▁t
255
+ - w
256
+ - ur
257
+ - it
258
+ - ent
259
+ - ▁can
260
+ - ▁he
261
+ - ▁but
262
+ - ra
263
+ - ce
264
+ - ▁me
265
+ - ▁b
266
+ - ▁ma
267
+ - ▁p
268
+ - ll
269
+ - ▁st
270
+ - ▁one
271
+ - 'on'
272
+ - ▁about
273
+ - th
274
+ - ▁de
275
+ - en
276
+ - ▁all
277
+ - ▁not
278
+ - il
279
+ - ▁g
280
+ - ch
281
+ - at
282
+ - ▁there
283
+ - ▁mo
284
+ - ter
285
+ - ation
286
+ - tion
287
+ - ▁at
288
+ - ▁my
289
+ - ro
290
+ - ▁as
291
+ - te
292
+ - ▁le
293
+ - ▁con
294
+ - ▁like
295
+ - ▁people
296
+ - ▁or
297
+ - ▁an
298
+ - el
299
+ - ▁if
300
+ - ▁from
301
+ - ver
302
+ - ▁su
303
+ - ▁co
304
+ - ate
305
+ - ▁these
306
+ - ol
307
+ - ci
308
+ - ▁now
309
+ - ▁see
310
+ - ▁out
311
+ - ▁our
312
+ - ion
313
+ - ▁know
314
+ - ect
315
+ - ▁just
316
+ - as
317
+ - ▁ex
318
+ - ▁ch
319
+ - ▁d
320
+ - ▁when
321
+ - ▁very
322
+ - ▁think
323
+ - ▁who
324
+ - ▁because
325
+ - ▁go
326
+ - ▁up
327
+ - ▁us
328
+ - ▁pa
329
+ - ▁no
330
+ - ies
331
+ - ▁di
332
+ - ▁ho
333
+ - om
334
+ - ive
335
+ - ▁get
336
+ - id
337
+ - ▁o
338
+ - ▁hi
339
+ - un
340
+ - ▁how
341
+ - ▁by
342
+ - ir
343
+ - et
344
+ - ck
345
+ - ity
346
+ - ▁po
347
+ - ul
348
+ - ▁which
349
+ - ▁mi
350
+ - ▁some
351
+ - z
352
+ - ▁sp
353
+ - ▁un
354
+ - ▁going
355
+ - ▁pro
356
+ - ist
357
+ - ▁se
358
+ - ▁look
359
+ - ▁time
360
+ - ment
361
+ - de
362
+ - ▁more
363
+ - ▁had
364
+ - ng
365
+ - ▁would
366
+ - ge
367
+ - la
368
+ - ▁here
369
+ - ▁really
370
+ - x
371
+ - ▁your
372
+ - ▁them
373
+ - us
374
+ - me
375
+ - ▁en
376
+ - ▁two
377
+ - ▁k
378
+ - ▁li
379
+ - ▁world
380
+ - ne
381
+ - ow
382
+ - ▁way
383
+ - ▁want
384
+ - ▁work
385
+ - ▁don
386
+ - ▁lo
387
+ - ▁fa
388
+ - ▁were
389
+ - ▁their
390
+ - age
391
+ - vi
392
+ - ▁ha
393
+ - ac
394
+ - der
395
+ - est
396
+ - ▁bo
397
+ - am
398
+ - ▁other
399
+ - able
400
+ - ▁actually
401
+ - ▁sh
402
+ - ▁make
403
+ - ▁ba
404
+ - ▁la
405
+ - ine
406
+ - ▁into
407
+ - ▁where
408
+ - ▁could
409
+ - ▁comp
410
+ - ting
411
+ - ▁has
412
+ - ▁will
413
+ - ▁ne
414
+ - j
415
+ - ical
416
+ - ally
417
+ - ▁vi
418
+ - ▁things
419
+ - ▁te
420
+ - igh
421
+ - ▁say
422
+ - ▁years
423
+ - ers
424
+ - ▁ra
425
+ - ther
426
+ - ▁than
427
+ - ru
428
+ - ▁ro
429
+ - op
430
+ - ▁did
431
+ - ▁any
432
+ - ▁new
433
+ - ound
434
+ - ig
435
+ - ▁well
436
+ - mo
437
+ - ▁she
438
+ - ▁na
439
+ - ▁been
440
+ - he
441
+ - ▁thousand
442
+ - ▁car
443
+ - ▁take
444
+ - ▁right
445
+ - ▁then
446
+ - ▁need
447
+ - ▁start
448
+ - ▁hundred
449
+ - ▁something
450
+ - ▁over
451
+ - ▁com
452
+ - ia
453
+ - ▁kind
454
+ - um
455
+ - if
456
+ - ▁those
457
+ - ▁first
458
+ - ▁pre
459
+ - ta
460
+ - ▁said
461
+ - ize
462
+ - end
463
+ - ▁even
464
+ - ▁thing
465
+ - one
466
+ - ▁back
467
+ - ite
468
+ - ▁every
469
+ - ▁little
470
+ - ry
471
+ - ▁life
472
+ - ▁much
473
+ - ke
474
+ - ▁also
475
+ - ▁most
476
+ - ant
477
+ - per
478
+ - ▁three
479
+ - ▁come
480
+ - ▁lot
481
+ - ance
482
+ - ▁got
483
+ - ▁talk
484
+ - ▁per
485
+ - ▁inter
486
+ - ▁sa
487
+ - ▁use
488
+ - ▁mu
489
+ - ▁part
490
+ - ish
491
+ - ence
492
+ - ▁happen
493
+ - ▁bi
494
+ - ▁mean
495
+ - ough
496
+ - ▁qu
497
+ - ▁bu
498
+ - ▁day
499
+ - ▁ga
500
+ - ▁only
501
+ - ▁many
502
+ - ▁different
503
+ - ▁dr
504
+ - ▁th
505
+ - ▁show
506
+ - ful
507
+ - ▁down
508
+ - ated
509
+ - ▁good
510
+ - ▁tra
511
+ - ▁around
512
+ - ▁idea
513
+ - ▁human
514
+ - ous
515
+ - ▁put
516
+ - ▁through
517
+ - ▁five
518
+ - ▁why
519
+ - ▁change
520
+ - ▁real
521
+ - ff
522
+ - ible
523
+ - ▁fact
524
+ - ▁same
525
+ - ▁jo
526
+ - ▁live
527
+ - ▁year
528
+ - ▁problem
529
+ - ▁ph
530
+ - ▁four
531
+ - ▁give
532
+ - ▁big
533
+ - ▁tell
534
+ - ▁great
535
+ - ▁try
536
+ - ▁va
537
+ - ▁ru
538
+ - ▁system
539
+ - ▁six
540
+ - ▁plan
541
+ - ▁place
542
+ - ▁build
543
+ - ▁called
544
+ - ▁again
545
+ - ▁point
546
+ - ▁twenty
547
+ - ▁percent
548
+ - ▁nine
549
+ - ▁find
550
+ - ▁app
551
+ - ▁after
552
+ - ▁long
553
+ - ▁eight
554
+ - ▁imp
555
+ - ▁gene
556
+ - ▁design
557
+ - ▁today
558
+ - ▁should
559
+ - ▁made
560
+ - ious
561
+ - ▁came
562
+ - ▁learn
563
+ - ▁last
564
+ - ▁own
565
+ - way
566
+ - ▁turn
567
+ - ▁seven
568
+ - ▁high
569
+ - ▁question
570
+ - ▁person
571
+ - ▁brain
572
+ - ▁important
573
+ - ▁another
574
+ - ▁thought
575
+ - ▁trans
576
+ - ▁create
577
+ - ness
578
+ - ▁hu
579
+ - ▁power
580
+ - ▁act
581
+ - land
582
+ - ▁play
583
+ - ▁sort
584
+ - ▁old
585
+ - ▁before
586
+ - ▁course
587
+ - ▁understand
588
+ - ▁feel
589
+ - ▁might
590
+ - ▁each
591
+ - ▁million
592
+ - ▁better
593
+ - ▁together
594
+ - ▁ago
595
+ - ▁example
596
+ - ▁help
597
+ - ▁story
598
+ - ▁next
599
+ - ▁hand
600
+ - ▁school
601
+ - ▁water
602
+ - ▁develop
603
+ - ▁technology
604
+ - que
605
+ - ▁second
606
+ - ▁grow
607
+ - ▁still
608
+ - ▁cell
609
+ - ▁believe
610
+ - ▁number
611
+ - ▁small
612
+ - ▁between
613
+ - qui
614
+ - ▁data
615
+ - ▁become
616
+ - ▁america
617
+ - ▁maybe
618
+ - ▁space
619
+ - ▁project
620
+ - ▁organ
621
+ - ▁vo
622
+ - ▁children
623
+ - ▁book
624
+ - graph
625
+ - ▁open
626
+ - ▁fifty
627
+ - ▁picture
628
+ - ▁health
629
+ - ▁thirty
630
+ - ▁africa
631
+ - ▁reason
632
+ - ▁large
633
+ - ▁hard
634
+ - ▁computer
635
+ - ▁always
636
+ - ▁sense
637
+ - ▁money
638
+ - ▁women
639
+ - ▁everything
640
+ - ▁information
641
+ - ▁country
642
+ - ▁teach
643
+ - ▁energy
644
+ - ▁experience
645
+ - ▁food
646
+ - ▁process
647
+ - qua
648
+ - ▁interesting
649
+ - ▁future
650
+ - ▁science
651
+ - q
652
+ - '0'
653
+ - '5'
654
+ - '6'
655
+ - '9'
656
+ - '3'
657
+ - '8'
658
+ - '4'
659
+ - N
660
+ - A
661
+ - '7'
662
+ - S
663
+ - G
664
+ - F
665
+ - R
666
+ - L
667
+ - U
668
+ - E
669
+ - T
670
+ - H
671
+ - _
672
+ - B
673
+ - D
674
+ - J
675
+ - M
676
+ - ă
677
+ - ō
678
+ - ť
679
+ - '2'
680
+ - '-'
681
+ - '1'
682
+ - C
683
+ - <sos/eos>
684
+ init: null
685
+ input_size: null
686
+ ctc_conf:
687
+ dropout_rate: 0.0
688
+ ctc_type: builtin
689
+ reduce: true
690
+ ignore_nan_grad: null
691
+ zero_infinity: true
692
+ joint_net_conf: null
693
+ use_preprocessor: true
694
+ token_type: bpe
695
+ bpemodel: data/en_token_list/bpe_unigram500/bpe.model
696
+ non_linguistic_symbols: null
697
+ cleaner: null
698
+ g2p: null
699
+ speech_volume_normalize: null
700
+ rir_scp: null
701
+ rir_apply_prob: 1.0
702
+ noise_scp: null
703
+ noise_apply_prob: 1.0
704
+ noise_db_range: '13_15'
705
+ short_noise_thres: 0.5
706
+ frontend: default
707
+ frontend_conf:
708
+ n_fft: 512
709
+ win_length: 400
710
+ hop_length: 160
711
+ fs: 16k
712
+ specaug: specaug
713
+ specaug_conf:
714
+ apply_time_warp: true
715
+ time_warp_window: 5
716
+ time_warp_mode: bicubic
717
+ apply_freq_mask: true
718
+ freq_mask_width_range:
719
+ - 0
720
+ - 27
721
+ num_freq_mask: 2
722
+ apply_time_mask: true
723
+ time_mask_width_ratio_range:
724
+ - 0.0
725
+ - 0.05
726
+ num_time_mask: 5
727
+ normalize: global_mvn
728
+ normalize_conf:
729
+ stats_file: exp/asr_stats_raw_en_bpe500_sp/train/feats_stats.npz
730
+ model: espnet
731
+ model_conf:
732
+ ctc_weight: 1.0
733
+ lsm_weight: 0.1
734
+ length_normalized_loss: false
735
+ preencoder: null
736
+ preencoder_conf: {}
737
+ encoder: e_branchformer
738
+ encoder_conf:
739
+ output_size: 256
740
+ attention_heads: 4
741
+ attention_layer_type: rel_selfattn
742
+ pos_enc_layer_type: rel_pos
743
+ rel_pos_type: latest
744
+ cgmlp_linear_units: 1024
745
+ cgmlp_conv_kernel: 31
746
+ use_linear_after_conv: false
747
+ gate_activation: identity
748
+ num_blocks: 12
749
+ dropout_rate: 0.1
750
+ positional_dropout_rate: 0.1
751
+ attention_dropout_rate: 0.1
752
+ input_layer: conv2d
753
+ layer_drop_rate: 0.0
754
+ linear_units: 1024
755
+ positionwise_layer_type: linear
756
+ use_ffn: true
757
+ macaron_ffn: true
758
+ merge_conv_kernel: 31
759
+ postencoder: null
760
+ postencoder_conf: {}
761
+ decoder: rnn
762
+ decoder_conf: {}
763
+ preprocessor: default
764
+ preprocessor_conf: {}
765
+ required:
766
+ - output_dir
767
+ - token_list
768
+ version: '202211'
769
+ distributed: true
770
+ ```
771
+
772
+ </details>
773
+
774
+
775
+
776
+ ### Citing ESPnet
777
+
778
+ ```BibTex
779
+ @inproceedings{watanabe2018espnet,
780
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
781
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
782
+ year={2018},
783
+ booktitle={Proceedings of Interspeech},
784
+ pages={2207--2211},
785
+ doi={10.21437/Interspeech.2018-1456},
786
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
787
+ }
788
+
789
+
790
+
791
+
792
+ ```
793
+
794
+ or arXiv:
795
+
796
+ ```bibtex
797
+ @misc{watanabe2018espnet,
798
+ title={ESPnet: End-to-End Speech Processing Toolkit},
799
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
800
+ year={2018},
801
+ eprint={1804.00015},
802
+ archivePrefix={arXiv},
803
+ primaryClass={cs.CL}
804
+ }
805
+ ```
data/en_token_list/bpe_unigram500/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca848c3a0b756847776bc5c8e8ae797ad73381cb4fe9db9109b3131e9416b5f6
3
+ size 244853
exp/asr_stats_raw_en_bpe500_sp/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9aa2bdc65662202e277008f62275fef28e17e564fbcf6b759a4a169cdcfdbbd
3
+ size 1402
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/RESULTS.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Fri Dec 30 20:15:46 CST 2022`
5
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202211`
7
+ - pytorch version: `pytorch 1.12.1`
8
+ - Git hash: `e62de171f1d11015cb856f83780c61bd5ca7fa8f`
9
+ - Commit date: `Thu Dec 29 14:18:44 2022 -0500`
10
+
11
+ ## asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|14671|92.5|5.5|2.0|1.2|8.7|77.3|
17
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|27500|92.7|4.9|2.3|1.1|8.3|70.6|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|78259|97.2|0.9|1.9|1.2|4.0|77.3|
24
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|145066|97.1|0.9|2.0|1.1|4.0|70.6|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
30
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/dev|466|28296|94.7|3.1|2.2|1.2|6.5|77.3|
31
+ |decode_asr_ctc_asr_model_valid.cer_ctc.ave/test|1155|52113|95.0|2.7|2.2|1.1|6.1|70.6|
32
+
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/config.yaml ADDED
@@ -0,0 +1,698 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_ctc_e_branchformer_e12_mlp1024_linear1024.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp
7
+ ngpu: 1
8
+ seed: 2022
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 47545
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 50
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - cer_ctc
39
+ - min
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: true
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 20
67
+ valid_batch_size: null
68
+ batch_bins: 50000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_en_bpe500_sp/train/speech_shape
72
+ - exp/asr_stats_raw_en_bpe500_sp/train/text_shape.bpe
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_en_bpe500_sp/valid/speech_shape
75
+ - exp/asr_stats_raw_en_bpe500_sp/valid/text_shape.bpe
76
+ batch_type: numel
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/train_sp/wav.scp
89
+ - speech
90
+ - kaldi_ark
91
+ - - dump/raw/train_sp/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/dev/wav.scp
96
+ - speech
97
+ - kaldi_ark
98
+ - - dump/raw/dev/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.002
108
+ weight_decay: 1.0e-06
109
+ scheduler: warmuplr
110
+ scheduler_conf:
111
+ warmup_steps: 15000
112
+ token_list:
113
+ - <blank>
114
+ - <unk>
115
+ - s
116
+ - ▁the
117
+ - t
118
+ - ▁a
119
+ - ▁and
120
+ - ▁to
121
+ - d
122
+ - e
123
+ - ▁of
124
+ - ''''
125
+ - n
126
+ - ing
127
+ - ▁in
128
+ - ▁i
129
+ - ▁that
130
+ - i
131
+ - a
132
+ - l
133
+ - p
134
+ - m
135
+ - y
136
+ - o
137
+ - ▁it
138
+ - ▁we
139
+ - c
140
+ - u
141
+ - ▁you
142
+ - ed
143
+ - ▁
144
+ - r
145
+ - ▁is
146
+ - re
147
+ - ▁this
148
+ - ar
149
+ - g
150
+ - ▁so
151
+ - al
152
+ - b
153
+ - ▁s
154
+ - or
155
+ - ▁f
156
+ - ▁c
157
+ - in
158
+ - k
159
+ - f
160
+ - ▁for
161
+ - ic
162
+ - er
163
+ - le
164
+ - ▁be
165
+ - ▁do
166
+ - ▁re
167
+ - ve
168
+ - ▁e
169
+ - ▁w
170
+ - ▁was
171
+ - es
172
+ - ▁they
173
+ - ly
174
+ - h
175
+ - ▁on
176
+ - v
177
+ - ▁are
178
+ - ri
179
+ - ▁have
180
+ - an
181
+ - ▁what
182
+ - ▁with
183
+ - ▁t
184
+ - w
185
+ - ur
186
+ - it
187
+ - ent
188
+ - ▁can
189
+ - ▁he
190
+ - ▁but
191
+ - ra
192
+ - ce
193
+ - ▁me
194
+ - ▁b
195
+ - ▁ma
196
+ - ▁p
197
+ - ll
198
+ - ▁st
199
+ - ▁one
200
+ - 'on'
201
+ - ▁about
202
+ - th
203
+ - ▁de
204
+ - en
205
+ - ▁all
206
+ - ▁not
207
+ - il
208
+ - ▁g
209
+ - ch
210
+ - at
211
+ - ▁there
212
+ - ▁mo
213
+ - ter
214
+ - ation
215
+ - tion
216
+ - ▁at
217
+ - ▁my
218
+ - ro
219
+ - ▁as
220
+ - te
221
+ - ▁le
222
+ - ▁con
223
+ - ▁like
224
+ - ▁people
225
+ - ▁or
226
+ - ▁an
227
+ - el
228
+ - ▁if
229
+ - ▁from
230
+ - ver
231
+ - ▁su
232
+ - ▁co
233
+ - ate
234
+ - ▁these
235
+ - ol
236
+ - ci
237
+ - ▁now
238
+ - ▁see
239
+ - ▁out
240
+ - ▁our
241
+ - ion
242
+ - ▁know
243
+ - ect
244
+ - ▁just
245
+ - as
246
+ - ▁ex
247
+ - ▁ch
248
+ - ▁d
249
+ - ▁when
250
+ - ▁very
251
+ - ▁think
252
+ - ▁who
253
+ - ▁because
254
+ - ▁go
255
+ - ▁up
256
+ - ▁us
257
+ - ▁pa
258
+ - ▁no
259
+ - ies
260
+ - ▁di
261
+ - ▁ho
262
+ - om
263
+ - ive
264
+ - ▁get
265
+ - id
266
+ - ▁o
267
+ - ▁hi
268
+ - un
269
+ - ▁how
270
+ - ▁by
271
+ - ir
272
+ - et
273
+ - ck
274
+ - ity
275
+ - ▁po
276
+ - ul
277
+ - ▁which
278
+ - ▁mi
279
+ - ▁some
280
+ - z
281
+ - ▁sp
282
+ - ▁un
283
+ - ▁going
284
+ - ▁pro
285
+ - ist
286
+ - ▁se
287
+ - ▁look
288
+ - ▁time
289
+ - ment
290
+ - de
291
+ - ▁more
292
+ - ▁had
293
+ - ng
294
+ - ▁would
295
+ - ge
296
+ - la
297
+ - ▁here
298
+ - ▁really
299
+ - x
300
+ - ▁your
301
+ - ▁them
302
+ - us
303
+ - me
304
+ - ▁en
305
+ - ▁two
306
+ - ▁k
307
+ - ▁li
308
+ - ▁world
309
+ - ne
310
+ - ow
311
+ - ▁way
312
+ - ▁want
313
+ - ▁work
314
+ - ▁don
315
+ - ▁lo
316
+ - ▁fa
317
+ - ▁were
318
+ - ▁their
319
+ - age
320
+ - vi
321
+ - ▁ha
322
+ - ac
323
+ - der
324
+ - est
325
+ - ▁bo
326
+ - am
327
+ - ▁other
328
+ - able
329
+ - ▁actually
330
+ - ▁sh
331
+ - ▁make
332
+ - ▁ba
333
+ - ▁la
334
+ - ine
335
+ - ▁into
336
+ - ▁where
337
+ - ▁could
338
+ - ▁comp
339
+ - ting
340
+ - ▁has
341
+ - ▁will
342
+ - ▁ne
343
+ - j
344
+ - ical
345
+ - ally
346
+ - ▁vi
347
+ - ▁things
348
+ - ▁te
349
+ - igh
350
+ - ▁say
351
+ - ▁years
352
+ - ers
353
+ - ▁ra
354
+ - ther
355
+ - ▁than
356
+ - ru
357
+ - ▁ro
358
+ - op
359
+ - ▁did
360
+ - ▁any
361
+ - ▁new
362
+ - ound
363
+ - ig
364
+ - ▁well
365
+ - mo
366
+ - ▁she
367
+ - ▁na
368
+ - ▁been
369
+ - he
370
+ - ▁thousand
371
+ - ▁car
372
+ - ▁take
373
+ - ▁right
374
+ - ▁then
375
+ - ▁need
376
+ - ▁start
377
+ - ▁hundred
378
+ - ▁something
379
+ - ▁over
380
+ - ▁com
381
+ - ia
382
+ - ▁kind
383
+ - um
384
+ - if
385
+ - ▁those
386
+ - ▁first
387
+ - ▁pre
388
+ - ta
389
+ - ▁said
390
+ - ize
391
+ - end
392
+ - ▁even
393
+ - ▁thing
394
+ - one
395
+ - ▁back
396
+ - ite
397
+ - ▁every
398
+ - ▁little
399
+ - ry
400
+ - ▁life
401
+ - ▁much
402
+ - ke
403
+ - ▁also
404
+ - ▁most
405
+ - ant
406
+ - per
407
+ - ▁three
408
+ - ▁come
409
+ - ▁lot
410
+ - ance
411
+ - ▁got
412
+ - ▁talk
413
+ - ▁per
414
+ - ▁inter
415
+ - ▁sa
416
+ - ▁use
417
+ - ▁mu
418
+ - ▁part
419
+ - ish
420
+ - ence
421
+ - ▁happen
422
+ - ▁bi
423
+ - ▁mean
424
+ - ough
425
+ - ▁qu
426
+ - ▁bu
427
+ - ▁day
428
+ - ▁ga
429
+ - ▁only
430
+ - ▁many
431
+ - ▁different
432
+ - ▁dr
433
+ - ▁th
434
+ - ▁show
435
+ - ful
436
+ - ▁down
437
+ - ated
438
+ - ▁good
439
+ - ▁tra
440
+ - ▁around
441
+ - ▁idea
442
+ - ▁human
443
+ - ous
444
+ - ▁put
445
+ - ▁through
446
+ - ▁five
447
+ - ▁why
448
+ - ▁change
449
+ - ▁real
450
+ - ff
451
+ - ible
452
+ - ▁fact
453
+ - ▁same
454
+ - ▁jo
455
+ - ▁live
456
+ - ▁year
457
+ - ▁problem
458
+ - ▁ph
459
+ - ▁four
460
+ - ▁give
461
+ - ▁big
462
+ - ▁tell
463
+ - ▁great
464
+ - ▁try
465
+ - ▁va
466
+ - ▁ru
467
+ - ▁system
468
+ - ▁six
469
+ - ▁plan
470
+ - ▁place
471
+ - ▁build
472
+ - ▁called
473
+ - ▁again
474
+ - ▁point
475
+ - ▁twenty
476
+ - ▁percent
477
+ - ▁nine
478
+ - ▁find
479
+ - ▁app
480
+ - ▁after
481
+ - ▁long
482
+ - ▁eight
483
+ - ▁imp
484
+ - ▁gene
485
+ - ▁design
486
+ - ▁today
487
+ - ▁should
488
+ - ▁made
489
+ - ious
490
+ - ▁came
491
+ - ▁learn
492
+ - ▁last
493
+ - ▁own
494
+ - way
495
+ - ▁turn
496
+ - ▁seven
497
+ - ▁high
498
+ - ▁question
499
+ - ▁person
500
+ - ▁brain
501
+ - ▁important
502
+ - ▁another
503
+ - ▁thought
504
+ - ▁trans
505
+ - ▁create
506
+ - ness
507
+ - ▁hu
508
+ - ▁power
509
+ - ▁act
510
+ - land
511
+ - ▁play
512
+ - ▁sort
513
+ - ▁old
514
+ - ▁before
515
+ - ▁course
516
+ - ▁understand
517
+ - ▁feel
518
+ - ▁might
519
+ - ▁each
520
+ - ▁million
521
+ - ▁better
522
+ - ▁together
523
+ - ▁ago
524
+ - ▁example
525
+ - ▁help
526
+ - ▁story
527
+ - ▁next
528
+ - ▁hand
529
+ - ▁school
530
+ - ▁water
531
+ - ▁develop
532
+ - ▁technology
533
+ - que
534
+ - ▁second
535
+ - ▁grow
536
+ - ▁still
537
+ - ▁cell
538
+ - ▁believe
539
+ - ▁number
540
+ - ▁small
541
+ - ▁between
542
+ - qui
543
+ - ▁data
544
+ - ▁become
545
+ - ▁america
546
+ - ▁maybe
547
+ - ▁space
548
+ - ▁project
549
+ - ▁organ
550
+ - ▁vo
551
+ - ▁children
552
+ - ▁book
553
+ - graph
554
+ - ▁open
555
+ - ▁fifty
556
+ - ▁picture
557
+ - ▁health
558
+ - ▁thirty
559
+ - ▁africa
560
+ - ▁reason
561
+ - ▁large
562
+ - ▁hard
563
+ - ▁computer
564
+ - ▁always
565
+ - ▁sense
566
+ - ▁money
567
+ - ▁women
568
+ - ▁everything
569
+ - ▁information
570
+ - ▁country
571
+ - ▁teach
572
+ - ▁energy
573
+ - ▁experience
574
+ - ▁food
575
+ - ▁process
576
+ - qua
577
+ - ▁interesting
578
+ - ▁future
579
+ - ▁science
580
+ - q
581
+ - '0'
582
+ - '5'
583
+ - '6'
584
+ - '9'
585
+ - '3'
586
+ - '8'
587
+ - '4'
588
+ - N
589
+ - A
590
+ - '7'
591
+ - S
592
+ - G
593
+ - F
594
+ - R
595
+ - L
596
+ - U
597
+ - E
598
+ - T
599
+ - H
600
+ - _
601
+ - B
602
+ - D
603
+ - J
604
+ - M
605
+ - ă
606
+ - ō
607
+ - ť
608
+ - '2'
609
+ - '-'
610
+ - '1'
611
+ - C
612
+ - <sos/eos>
613
+ init: null
614
+ input_size: null
615
+ ctc_conf:
616
+ dropout_rate: 0.0
617
+ ctc_type: builtin
618
+ reduce: true
619
+ ignore_nan_grad: null
620
+ zero_infinity: true
621
+ joint_net_conf: null
622
+ use_preprocessor: true
623
+ token_type: bpe
624
+ bpemodel: data/en_token_list/bpe_unigram500/bpe.model
625
+ non_linguistic_symbols: null
626
+ cleaner: null
627
+ g2p: null
628
+ speech_volume_normalize: null
629
+ rir_scp: null
630
+ rir_apply_prob: 1.0
631
+ noise_scp: null
632
+ noise_apply_prob: 1.0
633
+ noise_db_range: '13_15'
634
+ short_noise_thres: 0.5
635
+ frontend: default
636
+ frontend_conf:
637
+ n_fft: 512
638
+ win_length: 400
639
+ hop_length: 160
640
+ fs: 16k
641
+ specaug: specaug
642
+ specaug_conf:
643
+ apply_time_warp: true
644
+ time_warp_window: 5
645
+ time_warp_mode: bicubic
646
+ apply_freq_mask: true
647
+ freq_mask_width_range:
648
+ - 0
649
+ - 27
650
+ num_freq_mask: 2
651
+ apply_time_mask: true
652
+ time_mask_width_ratio_range:
653
+ - 0.0
654
+ - 0.05
655
+ num_time_mask: 5
656
+ normalize: global_mvn
657
+ normalize_conf:
658
+ stats_file: exp/asr_stats_raw_en_bpe500_sp/train/feats_stats.npz
659
+ model: espnet
660
+ model_conf:
661
+ ctc_weight: 1.0
662
+ lsm_weight: 0.1
663
+ length_normalized_loss: false
664
+ preencoder: null
665
+ preencoder_conf: {}
666
+ encoder: e_branchformer
667
+ encoder_conf:
668
+ output_size: 256
669
+ attention_heads: 4
670
+ attention_layer_type: rel_selfattn
671
+ pos_enc_layer_type: rel_pos
672
+ rel_pos_type: latest
673
+ cgmlp_linear_units: 1024
674
+ cgmlp_conv_kernel: 31
675
+ use_linear_after_conv: false
676
+ gate_activation: identity
677
+ num_blocks: 12
678
+ dropout_rate: 0.1
679
+ positional_dropout_rate: 0.1
680
+ attention_dropout_rate: 0.1
681
+ input_layer: conv2d
682
+ layer_drop_rate: 0.0
683
+ linear_units: 1024
684
+ positionwise_layer_type: linear
685
+ use_ffn: true
686
+ macaron_ffn: true
687
+ merge_conv_kernel: 31
688
+ postencoder: null
689
+ postencoder_conf: {}
690
+ decoder: rnn
691
+ decoder_conf: {}
692
+ preprocessor: default
693
+ preprocessor_conf: {}
694
+ required:
695
+ - output_dir
696
+ - token_list
697
+ version: '202211'
698
+ distributed: true
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/acc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/backward_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/cer.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/forward_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/iter_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss_att.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/train_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/images/wer.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/valid.cer_ctc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b32223bab83473449b2bc5212a295e9d0fc3ba7d23e0284720ff8fe20515df10
3
+ size 101377487
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202211'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/valid.cer_ctc.ave_10best.pth
4
+ python: "3.9.15 (main, Nov 24 2022, 14:31:59) \n[GCC 11.2.0]"
5
+ timestamp: 1672453009.401243
6
+ torch: 1.12.1
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_raw_en_bpe500_sp/config.yaml