Automatic Speech Recognition
ESPnet
English
audio
pyf98 commited on
Commit
8126b57
β€’
1 Parent(s): c213450

add model files

Browse files
Files changed (20) hide show
  1. README.md +931 -0
  2. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/RESULTS.md +29 -0
  3. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml +815 -0
  4. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/acc.png +0 -0
  5. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/backward_time.png +0 -0
  6. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer.png +0 -0
  7. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/forward_time.png +0 -0
  9. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/iter_time.png +0 -0
  11. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss.png +0 -0
  12. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_att.png +0 -0
  13. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_ctc.png +0 -0
  14. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim0_lr0.png +0 -0
  15. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim_step_time.png +0 -0
  16. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/train_time.png +0 -0
  17. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/wer.png +0 -0
  18. exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth +3 -0
  19. meta.yaml +8 -0
  20. score.log +46 -0
README.md ADDED
@@ -0,0 +1,931 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - slurp_entity
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `pyf98/slurp_entity_e_branchformer`
15
+
16
+ This model was trained by Yifan Peng using slurp_entity recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ References:
19
+ - [E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022)](https://arxiv.org/abs/2210.00077)
20
+ - [Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022)](https://proceedings.mlr.press/v162/peng22a.html)
21
+
22
+ ### Demo: How to use in ESPnet2
23
+
24
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
25
+ if you haven't done that already.
26
+
27
+ ```bash
28
+ cd espnet
29
+ git checkout 4bbd29a40cc7e2259996d30c0c76d3d789c1153d
30
+ pip install -e .
31
+ cd egs2/slurp_entity/asr1
32
+ ./run.sh --skip_data_prep false --skip_train true --download_model pyf98/slurp_entity_e_branchformer
33
+ ```
34
+
35
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
36
+ # RESULTS
37
+ ## Environments
38
+ - date: `Mon Feb 27 19:14:30 CST 2023`
39
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
40
+ - espnet version: `espnet 202301`
41
+ - pytorch version: `pytorch 1.13.1`
42
+ - Git hash: `4bbd29a40cc7e2259996d30c0c76d3d789c1153d`
43
+ - Commit date: `Sat Feb 25 21:54:03 2023 -0600`
44
+
45
+ ## exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
46
+ ### WER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|84.6|7.6|7.8|3.2|18.6|51.2|
51
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|83.7|7.7|8.6|3.0|19.3|49.7|
52
+
53
+ ### CER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.8|3.0|6.2|3.5|12.7|51.2|
58
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.7|3.1|7.2|3.4|13.6|49.7|
59
+
60
+
61
+ ### Intent Classification
62
+
63
+ - Valid Intent Classification Result:
64
+ 0.8781357882623706
65
+ - Test Intent Classification Result:
66
+ 0.8743691695977979
67
+
68
+ ### Entity
69
+
70
+ |Slu f1|Precision|Recall|F-Measure|
71
+ |:---:|:---:|:---:|:---:|
72
+ | test | 0.7940 | 0.7582 | 0.7757 |
73
+
74
+
75
+
76
+ ## ASR config
77
+
78
+ <details><summary>expand</summary>
79
+
80
+ ```
81
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop.yaml
82
+ print_config: false
83
+ log_level: INFO
84
+ dry_run: false
85
+ iterator_type: sequence
86
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
87
+ ngpu: 1
88
+ seed: 0
89
+ num_workers: 1
90
+ num_att_plot: 3
91
+ dist_backend: nccl
92
+ dist_init_method: env://
93
+ dist_world_size: null
94
+ dist_rank: null
95
+ local_rank: 0
96
+ dist_master_addr: null
97
+ dist_master_port: null
98
+ dist_launcher: null
99
+ multiprocessing_distributed: false
100
+ unused_parameters: false
101
+ sharded_ddp: false
102
+ cudnn_enabled: true
103
+ cudnn_benchmark: false
104
+ cudnn_deterministic: true
105
+ collect_stats: false
106
+ write_collected_feats: false
107
+ max_epoch: 60
108
+ patience: null
109
+ val_scheduler_criterion:
110
+ - valid
111
+ - loss
112
+ early_stopping_criterion:
113
+ - valid
114
+ - loss
115
+ - min
116
+ best_model_criterion:
117
+ - - valid
118
+ - acc
119
+ - max
120
+ keep_nbest_models: 10
121
+ nbest_averaging_interval: 0
122
+ grad_clip: 5.0
123
+ grad_clip_type: 2.0
124
+ grad_noise: false
125
+ accum_grad: 1
126
+ no_forward_run: false
127
+ resume: true
128
+ train_dtype: float32
129
+ use_amp: false
130
+ log_interval: null
131
+ use_matplotlib: true
132
+ use_tensorboard: true
133
+ create_graph_in_tensorboard: false
134
+ use_wandb: false
135
+ wandb_project: null
136
+ wandb_id: null
137
+ wandb_entity: null
138
+ wandb_name: null
139
+ wandb_model_log_interval: -1
140
+ detect_anomaly: false
141
+ pretrain_path: null
142
+ init_param: []
143
+ ignore_init_mismatch: false
144
+ freeze_param: []
145
+ num_iters_per_epoch: null
146
+ batch_size: 64
147
+ valid_batch_size: null
148
+ batch_bins: 1000000
149
+ valid_batch_bins: null
150
+ train_shape_file:
151
+ - exp/asr_stats_raw_en_word/train/speech_shape
152
+ - exp/asr_stats_raw_en_word/train/text_shape.word
153
+ valid_shape_file:
154
+ - exp/asr_stats_raw_en_word/valid/speech_shape
155
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
156
+ batch_type: folded
157
+ valid_batch_type: null
158
+ fold_length:
159
+ - 80000
160
+ - 150
161
+ sort_in_batch: descending
162
+ sort_batch: descending
163
+ multiple_iterator: false
164
+ chunk_length: 500
165
+ chunk_shift_ratio: 0.5
166
+ num_cache_chunks: 1024
167
+ train_data_path_and_name_and_type:
168
+ - - dump/raw/train/wav.scp
169
+ - speech
170
+ - kaldi_ark
171
+ - - dump/raw/train/text
172
+ - text
173
+ - text
174
+ valid_data_path_and_name_and_type:
175
+ - - dump/raw/devel/wav.scp
176
+ - speech
177
+ - kaldi_ark
178
+ - - dump/raw/devel/text
179
+ - text
180
+ - text
181
+ allow_variable_data_keys: false
182
+ max_cache_size: 0.0
183
+ max_cache_fd: 32
184
+ valid_max_cache_size: null
185
+ exclude_weight_decay: false
186
+ exclude_weight_decay_conf: {}
187
+ optim: adam
188
+ optim_conf:
189
+ lr: 0.001
190
+ weight_decay: 1.0e-06
191
+ scheduler: warmuplr
192
+ scheduler_conf:
193
+ warmup_steps: 35000
194
+ token_list:
195
+ - <blank>
196
+ - <unk>
197
+ - ▁SEP
198
+ - ▁FILL
199
+ - s
200
+ - ▁the
201
+ - a
202
+ - ▁to
203
+ - ▁i
204
+ - ▁me
205
+ - e
206
+ - ▁s
207
+ - ▁a
208
+ - i
209
+ - ▁you
210
+ - ▁what
211
+ - er
212
+ - ing
213
+ - u
214
+ - ▁is
215
+ - ''''
216
+ - o
217
+ - p
218
+ - ▁in
219
+ - ▁p
220
+ - y
221
+ - ▁my
222
+ - ▁please
223
+ - d
224
+ - c
225
+ - m
226
+ - ▁b
227
+ - l
228
+ - ▁m
229
+ - ▁c
230
+ - st
231
+ - date
232
+ - n
233
+ - ▁d
234
+ - le
235
+ - b
236
+ - ▁for
237
+ - re
238
+ - t
239
+ - ▁on
240
+ - en
241
+ - h
242
+ - 'on'
243
+ - ar
244
+ - person
245
+ - ▁re
246
+ - ▁f
247
+ - ▁g
248
+ - ▁of
249
+ - an
250
+ - ▁
251
+ - g
252
+ - ▁today
253
+ - ▁t
254
+ - or
255
+ - ▁it
256
+ - ▁this
257
+ - ▁h
258
+ - r
259
+ - f
260
+ - at
261
+ - ch
262
+ - ce
263
+ - place_name
264
+ - ▁email
265
+ - ▁do
266
+ - es
267
+ - ri
268
+ - ▁e
269
+ - ▁w
270
+ - ic
271
+ - in
272
+ - ▁that
273
+ - event_name
274
+ - ▁play
275
+ - ▁and
276
+ - al
277
+ - ▁n
278
+ - ▁can
279
+ - email_query
280
+ - ve
281
+ - ▁new
282
+ - day
283
+ - it
284
+ - ate
285
+ - ▁from
286
+ - ▁have
287
+ - k
288
+ - time
289
+ - ▁am
290
+ - media_type
291
+ - email_sendemail
292
+ - ent
293
+ - ▁olly
294
+ - qa_factoid
295
+ - se
296
+ - v
297
+ - et
298
+ - ck
299
+ - ▁any
300
+ - calendar_set
301
+ - ly
302
+ - th
303
+ - ▁how
304
+ - ▁meeting
305
+ - ed
306
+ - ▁tell
307
+ - ▁st
308
+ - x
309
+ - ur
310
+ - ro
311
+ - ▁at
312
+ - nd
313
+ - ▁list
314
+ - w
315
+ - ▁u
316
+ - ou
317
+ - ▁not
318
+ - ▁about
319
+ - ▁an
320
+ - ▁o
321
+ - general_negate
322
+ - ut
323
+ - ▁time
324
+ - ▁be
325
+ - ▁ch
326
+ - ▁are
327
+ - social_post
328
+ - business_name
329
+ - la
330
+ - ty
331
+ - play_music
332
+ - ot
333
+ - general_quirky
334
+ - ▁l
335
+ - ▁sh
336
+ - ▁tweet
337
+ - om
338
+ - ▁week
339
+ - um
340
+ - ▁one
341
+ - ter
342
+ - ▁he
343
+ - ▁up
344
+ - ▁com
345
+ - general_praise
346
+ - weather_query
347
+ - ▁next
348
+ - ▁th
349
+ - ▁check
350
+ - calendar_query
351
+ - ▁last
352
+ - ▁ro
353
+ - ad
354
+ - is
355
+ - ▁with
356
+ - ay
357
+ - ▁send
358
+ - pe
359
+ - ▁pm
360
+ - ▁tomorrow
361
+ - ▁j
362
+ - un
363
+ - ▁train
364
+ - general_explain
365
+ - ▁v
366
+ - one
367
+ - ▁r
368
+ - ra
369
+ - news_query
370
+ - ation
371
+ - ▁emails
372
+ - us
373
+ - if
374
+ - ct
375
+ - ▁co
376
+ - ▁add
377
+ - ▁will
378
+ - ▁se
379
+ - nt
380
+ - ▁was
381
+ - ine
382
+ - ▁de
383
+ - ▁set
384
+ - ▁ex
385
+ - ▁would
386
+ - ir
387
+ - ow
388
+ - ber
389
+ - general_repeat
390
+ - ight
391
+ - ook
392
+ - ▁again
393
+ - ▁song
394
+ - currency_name
395
+ - ll
396
+ - ▁ha
397
+ - ▁go
398
+ - relation
399
+ - te
400
+ - ion
401
+ - and
402
+ - ▁y
403
+ - ▁ye
404
+ - general_affirm
405
+ - general_confirm
406
+ - ery
407
+ - ▁po
408
+ - ff
409
+ - ▁we
410
+ - ▁turn
411
+ - ▁did
412
+ - ▁mar
413
+ - ▁alarm
414
+ - ▁like
415
+ - datetime_query
416
+ - ers
417
+ - ▁all
418
+ - ▁remind
419
+ - ▁so
420
+ - qa_definition
421
+ - ▁calendar
422
+ - end
423
+ - ▁said
424
+ - ci
425
+ - ▁off
426
+ - ▁john
427
+ - ▁day
428
+ - ss
429
+ - pla
430
+ - ume
431
+ - ▁get
432
+ - ail
433
+ - pp
434
+ - z
435
+ - ry
436
+ - am
437
+ - ▁need
438
+ - as
439
+ - ▁thank
440
+ - ▁wh
441
+ - ▁want
442
+ - ▁right
443
+ - ▁jo
444
+ - ▁facebook
445
+ - ▁k
446
+ - ge
447
+ - ld
448
+ - ▁fri
449
+ - ▁two
450
+ - general_dontcare
451
+ - ▁news
452
+ - ol
453
+ - oo
454
+ - ant
455
+ - ▁five
456
+ - ▁event
457
+ - ake
458
+ - definition_word
459
+ - transport_type
460
+ - ▁your
461
+ - vi
462
+ - orn
463
+ - op
464
+ - ▁weather
465
+ - ome
466
+ - ▁app
467
+ - ▁lo
468
+ - de
469
+ - ▁music
470
+ - weather_descriptor
471
+ - ak
472
+ - ke
473
+ - ▁there
474
+ - ▁si
475
+ - ▁lights
476
+ - ▁now
477
+ - ▁mo
478
+ - calendar_remove
479
+ - our
480
+ - ▁dollar
481
+ - food_type
482
+ - me
483
+ - ▁more
484
+ - ▁no
485
+ - ▁birthday
486
+ - orrect
487
+ - ▁rep
488
+ - ▁show
489
+ - play_radio
490
+ - ▁mon
491
+ - ▁does
492
+ - ood
493
+ - ag
494
+ - li
495
+ - ▁sto
496
+ - ▁contact
497
+ - cket
498
+ - email_querycontact
499
+ - ▁ev
500
+ - ▁could
501
+ - ange
502
+ - ▁just
503
+ - out
504
+ - ame
505
+ - .
506
+ - ▁ja
507
+ - ▁confirm
508
+ - qa_currency
509
+ - ▁man
510
+ - ▁late
511
+ - ▁think
512
+ - ▁some
513
+ - timeofday
514
+ - ▁bo
515
+ - qa_stock
516
+ - ong
517
+ - ▁start
518
+ - ▁work
519
+ - ▁ten
520
+ - int
521
+ - ▁command
522
+ - all
523
+ - ▁make
524
+ - ▁la
525
+ - j
526
+ - ▁answ
527
+ - ▁hour
528
+ - ▁cle
529
+ - ah
530
+ - ▁find
531
+ - ▁service
532
+ - ▁fa
533
+ - qu
534
+ - general_commandstop
535
+ - ai
536
+ - ▁when
537
+ - ▁te
538
+ - ▁by
539
+ - social_query
540
+ - ard
541
+ - ▁tw
542
+ - ul
543
+ - id
544
+ - ▁seven
545
+ - ▁where
546
+ - ▁much
547
+ - art
548
+ - ▁appointment
549
+ - ver
550
+ - artist_name
551
+ - el
552
+ - device_type
553
+ - ▁know
554
+ - ▁three
555
+ - ▁events
556
+ - ▁tr
557
+ - ▁li
558
+ - ork
559
+ - red
560
+ - ect
561
+ - ▁let
562
+ - ▁respon
563
+ - ▁par
564
+ - zz
565
+ - ▁give
566
+ - ▁twenty
567
+ - ▁ti
568
+ - ▁curre
569
+ - play_podcasts
570
+ - ▁radio
571
+ - cooking_recipe
572
+ - transport_query
573
+ - ▁con
574
+ - gh
575
+ - ▁le
576
+ - lists_query
577
+ - ▁rem
578
+ - recommendation_events
579
+ - house_place
580
+ - alarm_set
581
+ - play_audiobook
582
+ - ist
583
+ - ase
584
+ - music_genre
585
+ - ive
586
+ - ast
587
+ - player_setting
588
+ - ort
589
+ - lly
590
+ - news_topic
591
+ - list_name
592
+ - ▁playlist
593
+ - ▁ne
594
+ - business_type
595
+ - personal_info
596
+ - ind
597
+ - ust
598
+ - di
599
+ - ress
600
+ - recommendation_locations
601
+ - lists_createoradd
602
+ - iot_hue_lightoff
603
+ - lists_remove
604
+ - ord
605
+ - ▁light
606
+ - ere
607
+ - alarm_query
608
+ - audio_volume_mute
609
+ - music_query
610
+ - ▁audio
611
+ - rain
612
+ - ▁date
613
+ - ▁order
614
+ - audio_volume_up
615
+ - ▁ar
616
+ - ▁podcast
617
+ - transport_ticket
618
+ - mail
619
+ - iot_hue_lightchange
620
+ - iot_coffee
621
+ - radio_name
622
+ - ill
623
+ - ▁ri
624
+ - '@'
625
+ - takeaway_query
626
+ - song_name
627
+ - takeaway_order
628
+ - ▁ra
629
+ - email_addcontact
630
+ - play_game
631
+ - book
632
+ - transport_traffic
633
+ - ▁house
634
+ - music_likeness
635
+ - her
636
+ - transport_taxi
637
+ - iot_hue_lightdim
638
+ - ment
639
+ - ght
640
+ - fo
641
+ - order_type
642
+ - color_type
643
+ - '1'
644
+ - ven
645
+ - ould
646
+ - general_joke
647
+ - ess
648
+ - ain
649
+ - qa_maths
650
+ - ▁place
651
+ - ▁twe
652
+ - cast
653
+ - iot_cleaning
654
+ - ▁che
655
+ - ▁cont
656
+ - ith
657
+ - audiobook_name
658
+ - email_address
659
+ - game_name
660
+ - ▁cal
661
+ - general_frequency
662
+ - ▁tom
663
+ - ▁food
664
+ - act
665
+ - iot_hue_lightup
666
+ - '2'
667
+ - alarm_remove
668
+ - podcast_descriptor
669
+ - ▁definition
670
+ - audio_volume_down
671
+ - ▁media
672
+ - email_folder
673
+ - dia
674
+ - meal_type
675
+ - ▁mus
676
+ - recommendation_movies
677
+ - ▁ad
678
+ - ree
679
+ - pt
680
+ - now
681
+ - playlist_name
682
+ - ▁person
683
+ - change_amount
684
+ - ▁pla
685
+ - escri
686
+ - datetime_convert
687
+ - podcast_name
688
+ - ▁ab
689
+ - time_zone
690
+ - ▁def
691
+ - ting
692
+ - iot_wemo_on
693
+ - music_settings
694
+ - iot_wemo_off
695
+ - orre
696
+ - cy
697
+ - ank
698
+ - music_descriptor
699
+ - lar
700
+ - app_name
701
+ - row
702
+ - joke_type
703
+ - xt
704
+ - of
705
+ - ition
706
+ - ▁meet
707
+ - ink
708
+ - ▁confir
709
+ - transport_agency
710
+ - general_greet
711
+ - ▁business
712
+ - ▁art
713
+ - ▁ag
714
+ - urn
715
+ - escript
716
+ - rom
717
+ - ▁rel
718
+ - ▁au
719
+ - ▁currency
720
+ - audio_volume_other
721
+ - iot_hue_lighton
722
+ - ▁artist
723
+ - '?'
724
+ - ▁bus
725
+ - cooking_type
726
+ - movie_name
727
+ - coffee_type
728
+ - ingredient
729
+ - ather
730
+ - music_dislikeness
731
+ - sp
732
+ - q
733
+ - ▁ser
734
+ - esc
735
+ - ▁bir
736
+ - ▁cur
737
+ - name
738
+ - ▁tran
739
+ - ▁hou
740
+ - ek
741
+ - uch
742
+ - ▁conf
743
+ - ▁face
744
+ - '9'
745
+ - ▁birth
746
+ - I
747
+ - sw
748
+ - transport_descriptor
749
+ - ▁comm
750
+ - lease
751
+ - transport_name
752
+ - aid
753
+ - movie_type
754
+ - ▁device
755
+ - alarm_type
756
+ - audiobook_author
757
+ - '5'
758
+ - drink_type
759
+ - ▁joh
760
+ - ▁defin
761
+ - word
762
+ - ▁curren
763
+ - order
764
+ - iness
765
+ - W
766
+ - cooking_query
767
+ - sport_type
768
+ - ▁relation
769
+ - oint
770
+ - H
771
+ - '8'
772
+ - A
773
+ - '0'
774
+ - ▁dol
775
+ - vice
776
+ - ▁pers
777
+ - '&'
778
+ - T
779
+ - ▁appoint
780
+ - _
781
+ - '7'
782
+ - '3'
783
+ - '-'
784
+ - game_type
785
+ - ▁pod
786
+ - N
787
+ - M
788
+ - E
789
+ - list
790
+ - music_album
791
+ - dio
792
+ - ▁transport
793
+ - qa_query
794
+ - C
795
+ - O
796
+ - U
797
+ - query_detail
798
+ - ']'
799
+ - '['
800
+ - descriptor
801
+ - ':'
802
+ - spon
803
+ - <sos/eos>
804
+ init: null
805
+ input_size: null
806
+ ctc_conf:
807
+ dropout_rate: 0.0
808
+ ctc_type: builtin
809
+ reduce: true
810
+ ignore_nan_grad: null
811
+ zero_infinity: true
812
+ joint_net_conf: null
813
+ use_preprocessor: true
814
+ token_type: word
815
+ bpemodel: null
816
+ non_linguistic_symbols: null
817
+ cleaner: null
818
+ g2p: null
819
+ speech_volume_normalize: null
820
+ rir_scp: null
821
+ rir_apply_prob: 1.0
822
+ noise_scp: null
823
+ noise_apply_prob: 1.0
824
+ noise_db_range: '13_15'
825
+ short_noise_thres: 0.5
826
+ aux_ctc_tasks: []
827
+ frontend: default
828
+ frontend_conf:
829
+ fs: 16k
830
+ specaug: specaug
831
+ specaug_conf:
832
+ apply_time_warp: true
833
+ time_warp_window: 5
834
+ time_warp_mode: bicubic
835
+ apply_freq_mask: true
836
+ freq_mask_width_range:
837
+ - 0
838
+ - 30
839
+ num_freq_mask: 2
840
+ apply_time_mask: true
841
+ time_mask_width_range:
842
+ - 0
843
+ - 40
844
+ num_time_mask: 2
845
+ normalize: utterance_mvn
846
+ normalize_conf: {}
847
+ model: espnet
848
+ model_conf:
849
+ ctc_weight: 0.3
850
+ lsm_weight: 0.1
851
+ length_normalized_loss: false
852
+ extract_feats_in_collect_stats: false
853
+ preencoder: null
854
+ preencoder_conf: {}
855
+ encoder: e_branchformer
856
+ encoder_conf:
857
+ output_size: 512
858
+ attention_heads: 8
859
+ attention_layer_type: rel_selfattn
860
+ pos_enc_layer_type: rel_pos
861
+ rel_pos_type: latest
862
+ cgmlp_linear_units: 3072
863
+ cgmlp_conv_kernel: 31
864
+ use_linear_after_conv: false
865
+ gate_activation: identity
866
+ num_blocks: 12
867
+ dropout_rate: 0.1
868
+ positional_dropout_rate: 0.1
869
+ attention_dropout_rate: 0.1
870
+ input_layer: conv2d
871
+ layer_drop_rate: 0.1
872
+ linear_units: 1024
873
+ positionwise_layer_type: linear
874
+ macaron_ffn: true
875
+ use_ffn: true
876
+ merge_conv_kernel: 31
877
+ postencoder: null
878
+ postencoder_conf: {}
879
+ decoder: transformer
880
+ decoder_conf:
881
+ attention_heads: 8
882
+ linear_units: 2048
883
+ num_blocks: 6
884
+ dropout_rate: 0.1
885
+ positional_dropout_rate: 0.1
886
+ self_attention_dropout_rate: 0.1
887
+ src_attention_dropout_rate: 0.1
888
+ layer_drop_rate: 0.2
889
+ preprocessor: default
890
+ preprocessor_conf: {}
891
+ required:
892
+ - output_dir
893
+ - token_list
894
+ version: '202301'
895
+ distributed: false
896
+ ```
897
+
898
+ </details>
899
+
900
+
901
+
902
+ ### Citing ESPnet
903
+
904
+ ```BibTex
905
+ @inproceedings{watanabe2018espnet,
906
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
907
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
908
+ year={2018},
909
+ booktitle={Proceedings of Interspeech},
910
+ pages={2207--2211},
911
+ doi={10.21437/Interspeech.2018-1456},
912
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
913
+ }
914
+
915
+
916
+
917
+
918
+ ```
919
+
920
+ or arXiv:
921
+
922
+ ```bibtex
923
+ @misc{watanabe2018espnet,
924
+ title={ESPnet: End-to-End Speech Processing Toolkit},
925
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
926
+ year={2018},
927
+ eprint={1804.00015},
928
+ archivePrefix={arXiv},
929
+ primaryClass={cs.CL}
930
+ }
931
+ ```
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon Feb 27 19:14:30 CST 2023`
5
+ - python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202301`
7
+ - pytorch version: `pytorch 1.13.1`
8
+ - Git hash: `4bbd29a40cc7e2259996d30c0c76d3d789c1153d`
9
+ - Commit date: `Sat Feb 25 21:54:03 2023 -0600`
10
+
11
+ ## exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|84.6|7.6|7.8|3.2|18.6|51.2|
17
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|83.7|7.7|8.6|3.0|19.3|49.7|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.8|3.0|6.2|3.5|12.7|51.2|
24
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.7|3.1|7.2|3.4|13.6|49.7|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 60
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ create_graph_in_tensorboard: false
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ wandb_entity: null
58
+ wandb_name: null
59
+ wandb_model_log_interval: -1
60
+ detect_anomaly: false
61
+ pretrain_path: null
62
+ init_param: []
63
+ ignore_init_mismatch: false
64
+ freeze_param: []
65
+ num_iters_per_epoch: null
66
+ batch_size: 64
67
+ valid_batch_size: null
68
+ batch_bins: 1000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_en_word/train/speech_shape
72
+ - exp/asr_stats_raw_en_word/train/text_shape.word
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_en_word/valid/speech_shape
75
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
76
+ batch_type: folded
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/train/wav.scp
89
+ - speech
90
+ - kaldi_ark
91
+ - - dump/raw/train/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/devel/wav.scp
96
+ - speech
97
+ - kaldi_ark
98
+ - - dump/raw/devel/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ exclude_weight_decay: false
106
+ exclude_weight_decay_conf: {}
107
+ optim: adam
108
+ optim_conf:
109
+ lr: 0.001
110
+ weight_decay: 1.0e-06
111
+ scheduler: warmuplr
112
+ scheduler_conf:
113
+ warmup_steps: 35000
114
+ token_list:
115
+ - <blank>
116
+ - <unk>
117
+ - ▁SEP
118
+ - ▁FILL
119
+ - s
120
+ - ▁the
121
+ - a
122
+ - ▁to
123
+ - ▁i
124
+ - ▁me
125
+ - e
126
+ - ▁s
127
+ - ▁a
128
+ - i
129
+ - ▁you
130
+ - ▁what
131
+ - er
132
+ - ing
133
+ - u
134
+ - ▁is
135
+ - ''''
136
+ - o
137
+ - p
138
+ - ▁in
139
+ - ▁p
140
+ - y
141
+ - ▁my
142
+ - ▁please
143
+ - d
144
+ - c
145
+ - m
146
+ - ▁b
147
+ - l
148
+ - ▁m
149
+ - ▁c
150
+ - st
151
+ - date
152
+ - n
153
+ - ▁d
154
+ - le
155
+ - b
156
+ - ▁for
157
+ - re
158
+ - t
159
+ - ▁on
160
+ - en
161
+ - h
162
+ - 'on'
163
+ - ar
164
+ - person
165
+ - ▁re
166
+ - ▁f
167
+ - ▁g
168
+ - ▁of
169
+ - an
170
+ - ▁
171
+ - g
172
+ - ▁today
173
+ - ▁t
174
+ - or
175
+ - ▁it
176
+ - ▁this
177
+ - ▁h
178
+ - r
179
+ - f
180
+ - at
181
+ - ch
182
+ - ce
183
+ - place_name
184
+ - ▁email
185
+ - ▁do
186
+ - es
187
+ - ri
188
+ - ▁e
189
+ - ▁w
190
+ - ic
191
+ - in
192
+ - ▁that
193
+ - event_name
194
+ - ▁play
195
+ - ▁and
196
+ - al
197
+ - ▁n
198
+ - ▁can
199
+ - email_query
200
+ - ve
201
+ - ▁new
202
+ - day
203
+ - it
204
+ - ate
205
+ - ▁from
206
+ - ▁have
207
+ - k
208
+ - time
209
+ - ▁am
210
+ - media_type
211
+ - email_sendemail
212
+ - ent
213
+ - ▁olly
214
+ - qa_factoid
215
+ - se
216
+ - v
217
+ - et
218
+ - ck
219
+ - ▁any
220
+ - calendar_set
221
+ - ly
222
+ - th
223
+ - ▁how
224
+ - ▁meeting
225
+ - ed
226
+ - ▁tell
227
+ - ▁st
228
+ - x
229
+ - ur
230
+ - ro
231
+ - ▁at
232
+ - nd
233
+ - ▁list
234
+ - w
235
+ - ▁u
236
+ - ou
237
+ - ▁not
238
+ - ▁about
239
+ - ▁an
240
+ - ▁o
241
+ - general_negate
242
+ - ut
243
+ - ▁time
244
+ - ▁be
245
+ - ▁ch
246
+ - ▁are
247
+ - social_post
248
+ - business_name
249
+ - la
250
+ - ty
251
+ - play_music
252
+ - ot
253
+ - general_quirky
254
+ - ▁l
255
+ - ▁sh
256
+ - ▁tweet
257
+ - om
258
+ - ▁week
259
+ - um
260
+ - ▁one
261
+ - ter
262
+ - ▁he
263
+ - ▁up
264
+ - ▁com
265
+ - general_praise
266
+ - weather_query
267
+ - ▁next
268
+ - ▁th
269
+ - ▁check
270
+ - calendar_query
271
+ - ▁last
272
+ - ▁ro
273
+ - ad
274
+ - is
275
+ - ▁with
276
+ - ay
277
+ - ▁send
278
+ - pe
279
+ - ▁pm
280
+ - ▁tomorrow
281
+ - ▁j
282
+ - un
283
+ - ▁train
284
+ - general_explain
285
+ - ▁v
286
+ - one
287
+ - ▁r
288
+ - ra
289
+ - news_query
290
+ - ation
291
+ - ▁emails
292
+ - us
293
+ - if
294
+ - ct
295
+ - ▁co
296
+ - ▁add
297
+ - ▁will
298
+ - ▁se
299
+ - nt
300
+ - ▁was
301
+ - ine
302
+ - ▁de
303
+ - ▁set
304
+ - ▁ex
305
+ - ▁would
306
+ - ir
307
+ - ow
308
+ - ber
309
+ - general_repeat
310
+ - ight
311
+ - ook
312
+ - ▁again
313
+ - ▁song
314
+ - currency_name
315
+ - ll
316
+ - ▁ha
317
+ - ▁go
318
+ - relation
319
+ - te
320
+ - ion
321
+ - and
322
+ - ▁y
323
+ - ▁ye
324
+ - general_affirm
325
+ - general_confirm
326
+ - ery
327
+ - ▁po
328
+ - ff
329
+ - ▁we
330
+ - ▁turn
331
+ - ▁did
332
+ - ▁mar
333
+ - ▁alarm
334
+ - ▁like
335
+ - datetime_query
336
+ - ers
337
+ - ▁all
338
+ - ▁remind
339
+ - ▁so
340
+ - qa_definition
341
+ - ▁calendar
342
+ - end
343
+ - ▁said
344
+ - ci
345
+ - ▁off
346
+ - ▁john
347
+ - ▁day
348
+ - ss
349
+ - pla
350
+ - ume
351
+ - ▁get
352
+ - ail
353
+ - pp
354
+ - z
355
+ - ry
356
+ - am
357
+ - ▁need
358
+ - as
359
+ - ▁thank
360
+ - ▁wh
361
+ - ▁want
362
+ - ▁right
363
+ - ▁jo
364
+ - ▁facebook
365
+ - ▁k
366
+ - ge
367
+ - ld
368
+ - ▁fri
369
+ - ▁two
370
+ - general_dontcare
371
+ - ▁news
372
+ - ol
373
+ - oo
374
+ - ant
375
+ - ▁five
376
+ - ▁event
377
+ - ake
378
+ - definition_word
379
+ - transport_type
380
+ - ▁your
381
+ - vi
382
+ - orn
383
+ - op
384
+ - ▁weather
385
+ - ome
386
+ - ▁app
387
+ - ▁lo
388
+ - de
389
+ - ▁music
390
+ - weather_descriptor
391
+ - ak
392
+ - ke
393
+ - ▁there
394
+ - ▁si
395
+ - ▁lights
396
+ - ▁now
397
+ - ▁mo
398
+ - calendar_remove
399
+ - our
400
+ - ▁dollar
401
+ - food_type
402
+ - me
403
+ - ▁more
404
+ - ▁no
405
+ - ▁birthday
406
+ - orrect
407
+ - ▁rep
408
+ - ▁show
409
+ - play_radio
410
+ - ▁mon
411
+ - ▁does
412
+ - ood
413
+ - ag
414
+ - li
415
+ - ▁sto
416
+ - ▁contact
417
+ - cket
418
+ - email_querycontact
419
+ - ▁ev
420
+ - ▁could
421
+ - ange
422
+ - ▁just
423
+ - out
424
+ - ame
425
+ - .
426
+ - ▁ja
427
+ - ▁confirm
428
+ - qa_currency
429
+ - ▁man
430
+ - ▁late
431
+ - ▁think
432
+ - ▁some
433
+ - timeofday
434
+ - ▁bo
435
+ - qa_stock
436
+ - ong
437
+ - ▁start
438
+ - ▁work
439
+ - ▁ten
440
+ - int
441
+ - ▁command
442
+ - all
443
+ - ▁make
444
+ - ▁la
445
+ - j
446
+ - ▁answ
447
+ - ▁hour
448
+ - ▁cle
449
+ - ah
450
+ - ▁find
451
+ - ▁service
452
+ - ▁fa
453
+ - qu
454
+ - general_commandstop
455
+ - ai
456
+ - ▁when
457
+ - ▁te
458
+ - ▁by
459
+ - social_query
460
+ - ard
461
+ - ▁tw
462
+ - ul
463
+ - id
464
+ - ▁seven
465
+ - ▁where
466
+ - ▁much
467
+ - art
468
+ - ▁appointment
469
+ - ver
470
+ - artist_name
471
+ - el
472
+ - device_type
473
+ - ▁know
474
+ - ▁three
475
+ - ▁events
476
+ - ▁tr
477
+ - ▁li
478
+ - ork
479
+ - red
480
+ - ect
481
+ - ▁let
482
+ - ▁respon
483
+ - ▁par
484
+ - zz
485
+ - ▁give
486
+ - ▁twenty
487
+ - ▁ti
488
+ - ▁curre
489
+ - play_podcasts
490
+ - ▁radio
491
+ - cooking_recipe
492
+ - transport_query
493
+ - ▁con
494
+ - gh
495
+ - ▁le
496
+ - lists_query
497
+ - ▁rem
498
+ - recommendation_events
499
+ - house_place
500
+ - alarm_set
501
+ - play_audiobook
502
+ - ist
503
+ - ase
504
+ - music_genre
505
+ - ive
506
+ - ast
507
+ - player_setting
508
+ - ort
509
+ - lly
510
+ - news_topic
511
+ - list_name
512
+ - ▁playlist
513
+ - ▁ne
514
+ - business_type
515
+ - personal_info
516
+ - ind
517
+ - ust
518
+ - di
519
+ - ress
520
+ - recommendation_locations
521
+ - lists_createoradd
522
+ - iot_hue_lightoff
523
+ - lists_remove
524
+ - ord
525
+ - ▁light
526
+ - ere
527
+ - alarm_query
528
+ - audio_volume_mute
529
+ - music_query
530
+ - ▁audio
531
+ - rain
532
+ - ▁date
533
+ - ▁order
534
+ - audio_volume_up
535
+ - ▁ar
536
+ - ▁podcast
537
+ - transport_ticket
538
+ - mail
539
+ - iot_hue_lightchange
540
+ - iot_coffee
541
+ - radio_name
542
+ - ill
543
+ - ▁ri
544
+ - '@'
545
+ - takeaway_query
546
+ - song_name
547
+ - takeaway_order
548
+ - ▁ra
549
+ - email_addcontact
550
+ - play_game
551
+ - book
552
+ - transport_traffic
553
+ - ▁house
554
+ - music_likeness
555
+ - her
556
+ - transport_taxi
557
+ - iot_hue_lightdim
558
+ - ment
559
+ - ght
560
+ - fo
561
+ - order_type
562
+ - color_type
563
+ - '1'
564
+ - ven
565
+ - ould
566
+ - general_joke
567
+ - ess
568
+ - ain
569
+ - qa_maths
570
+ - ▁place
571
+ - ▁twe
572
+ - cast
573
+ - iot_cleaning
574
+ - ▁che
575
+ - ▁cont
576
+ - ith
577
+ - audiobook_name
578
+ - email_address
579
+ - game_name
580
+ - ▁cal
581
+ - general_frequency
582
+ - ▁tom
583
+ - ▁food
584
+ - act
585
+ - iot_hue_lightup
586
+ - '2'
587
+ - alarm_remove
588
+ - podcast_descriptor
589
+ - ▁definition
590
+ - audio_volume_down
591
+ - ▁media
592
+ - email_folder
593
+ - dia
594
+ - meal_type
595
+ - ▁mus
596
+ - recommendation_movies
597
+ - ▁ad
598
+ - ree
599
+ - pt
600
+ - now
601
+ - playlist_name
602
+ - ▁person
603
+ - change_amount
604
+ - ▁pla
605
+ - escri
606
+ - datetime_convert
607
+ - podcast_name
608
+ - ▁ab
609
+ - time_zone
610
+ - ▁def
611
+ - ting
612
+ - iot_wemo_on
613
+ - music_settings
614
+ - iot_wemo_off
615
+ - orre
616
+ - cy
617
+ - ank
618
+ - music_descriptor
619
+ - lar
620
+ - app_name
621
+ - row
622
+ - joke_type
623
+ - xt
624
+ - of
625
+ - ition
626
+ - ▁meet
627
+ - ink
628
+ - ▁confir
629
+ - transport_agency
630
+ - general_greet
631
+ - ▁business
632
+ - ▁art
633
+ - ▁ag
634
+ - urn
635
+ - escript
636
+ - rom
637
+ - ▁rel
638
+ - ▁au
639
+ - ▁currency
640
+ - audio_volume_other
641
+ - iot_hue_lighton
642
+ - ▁artist
643
+ - '?'
644
+ - ▁bus
645
+ - cooking_type
646
+ - movie_name
647
+ - coffee_type
648
+ - ingredient
649
+ - ather
650
+ - music_dislikeness
651
+ - sp
652
+ - q
653
+ - ▁ser
654
+ - esc
655
+ - ▁bir
656
+ - ▁cur
657
+ - name
658
+ - ▁tran
659
+ - ▁hou
660
+ - ek
661
+ - uch
662
+ - ▁conf
663
+ - ▁face
664
+ - '9'
665
+ - ▁birth
666
+ - I
667
+ - sw
668
+ - transport_descriptor
669
+ - ▁comm
670
+ - lease
671
+ - transport_name
672
+ - aid
673
+ - movie_type
674
+ - ▁device
675
+ - alarm_type
676
+ - audiobook_author
677
+ - '5'
678
+ - drink_type
679
+ - ▁joh
680
+ - ▁defin
681
+ - word
682
+ - ▁curren
683
+ - order
684
+ - iness
685
+ - W
686
+ - cooking_query
687
+ - sport_type
688
+ - ▁relation
689
+ - oint
690
+ - H
691
+ - '8'
692
+ - A
693
+ - '0'
694
+ - ▁dol
695
+ - vice
696
+ - ▁pers
697
+ - '&'
698
+ - T
699
+ - ▁appoint
700
+ - _
701
+ - '7'
702
+ - '3'
703
+ - '-'
704
+ - game_type
705
+ - ▁pod
706
+ - N
707
+ - M
708
+ - E
709
+ - list
710
+ - music_album
711
+ - dio
712
+ - ▁transport
713
+ - qa_query
714
+ - C
715
+ - O
716
+ - U
717
+ - query_detail
718
+ - ']'
719
+ - '['
720
+ - descriptor
721
+ - ':'
722
+ - spon
723
+ - <sos/eos>
724
+ init: null
725
+ input_size: null
726
+ ctc_conf:
727
+ dropout_rate: 0.0
728
+ ctc_type: builtin
729
+ reduce: true
730
+ ignore_nan_grad: null
731
+ zero_infinity: true
732
+ joint_net_conf: null
733
+ use_preprocessor: true
734
+ token_type: word
735
+ bpemodel: null
736
+ non_linguistic_symbols: null
737
+ cleaner: null
738
+ g2p: null
739
+ speech_volume_normalize: null
740
+ rir_scp: null
741
+ rir_apply_prob: 1.0
742
+ noise_scp: null
743
+ noise_apply_prob: 1.0
744
+ noise_db_range: '13_15'
745
+ short_noise_thres: 0.5
746
+ aux_ctc_tasks: []
747
+ frontend: default
748
+ frontend_conf:
749
+ fs: 16k
750
+ specaug: specaug
751
+ specaug_conf:
752
+ apply_time_warp: true
753
+ time_warp_window: 5
754
+ time_warp_mode: bicubic
755
+ apply_freq_mask: true
756
+ freq_mask_width_range:
757
+ - 0
758
+ - 30
759
+ num_freq_mask: 2
760
+ apply_time_mask: true
761
+ time_mask_width_range:
762
+ - 0
763
+ - 40
764
+ num_time_mask: 2
765
+ normalize: utterance_mvn
766
+ normalize_conf: {}
767
+ model: espnet
768
+ model_conf:
769
+ ctc_weight: 0.3
770
+ lsm_weight: 0.1
771
+ length_normalized_loss: false
772
+ extract_feats_in_collect_stats: false
773
+ preencoder: null
774
+ preencoder_conf: {}
775
+ encoder: e_branchformer
776
+ encoder_conf:
777
+ output_size: 512
778
+ attention_heads: 8
779
+ attention_layer_type: rel_selfattn
780
+ pos_enc_layer_type: rel_pos
781
+ rel_pos_type: latest
782
+ cgmlp_linear_units: 3072
783
+ cgmlp_conv_kernel: 31
784
+ use_linear_after_conv: false
785
+ gate_activation: identity
786
+ num_blocks: 12
787
+ dropout_rate: 0.1
788
+ positional_dropout_rate: 0.1
789
+ attention_dropout_rate: 0.1
790
+ input_layer: conv2d
791
+ layer_drop_rate: 0.1
792
+ linear_units: 1024
793
+ positionwise_layer_type: linear
794
+ macaron_ffn: true
795
+ use_ffn: true
796
+ merge_conv_kernel: 31
797
+ postencoder: null
798
+ postencoder_conf: {}
799
+ decoder: transformer
800
+ decoder_conf:
801
+ attention_heads: 8
802
+ linear_units: 2048
803
+ num_blocks: 6
804
+ dropout_rate: 0.1
805
+ positional_dropout_rate: 0.1
806
+ self_attention_dropout_rate: 0.1
807
+ src_attention_dropout_rate: 0.1
808
+ layer_drop_rate: 0.2
809
+ preprocessor: default
810
+ preprocessor_conf: {}
811
+ required:
812
+ - output_dir
813
+ - token_list
814
+ version: '202301'
815
+ distributed: false
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/acc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/backward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/forward_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/iter_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_att.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_ctc.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim0_lr0.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim_step_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/train_time.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/wer.png ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:138f1491cf079c779fe50fbbb016d2bded08ddc1e3d375075e99d24aa3bb6e31
3
+ size 441177571
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202301'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth
4
+ python: "3.9.15 (main, Nov 24 2022, 14:31:59) \n[GCC 11.2.0]"
5
+ timestamp: 1677546947.945574
6
+ torch: 1.13.1
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml
score.log ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Valid Intent Classification Result
2
+ 0.8781357882623706
3
+ Test Intent Classification Result
4
+ 0.8743691695977979
5
+ ╒════════════╀═════════════╀══════════╀═════════════╕
6
+ β”‚ Scenario β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
7
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
8
+ β”‚ OVERALL β”‚ 0.9084 β”‚ 0.9084 β”‚ 0.9084 β”‚
9
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
10
+
11
+ ╒══════════╀═════════════╀══════════╀═════════════╕
12
+ β”‚ Action β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
13
+ β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
14
+ β”‚ OVERALL β”‚ 0.8852 β”‚ 0.8852 β”‚ 0.8852 β”‚
15
+ β•˜β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
16
+
17
+ ╒═════════════════════╀═════════════╀══════════╀═════════════╕
18
+ β”‚ Intent (scen_act) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
19
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
20
+ β”‚ OVERALL β”‚ 0.8744 β”‚ 0.8744 β”‚ 0.8744 β”‚
21
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
22
+
23
+ ╒════════════╀═════════════╀══════════╀═════════════╕
24
+ β”‚ Entities β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
25
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
26
+ β”‚ OVERALL β”‚ 0.7378 β”‚ 0.7015 β”‚ 0.7192 β”‚
27
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
28
+
29
+ ╒════════════════════════════╀═════════════╀══════════╀═════════════╕
30
+ β”‚ Entities (distance word) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
31
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
32
+ β”‚ OVERALL β”‚ 0.7760 β”‚ 0.7418 β”‚ 0.7585 β”‚
33
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
34
+
35
+ ╒════════════════════════════╀═════════════╀══════════╀═════════════╕
36
+ β”‚ Entities (distance char) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
37
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
38
+ β”‚ OVERALL β”‚ 0.8129 β”‚ 0.7754 β”‚ 0.7937 β”‚
39
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
40
+
41
+ ╒══════════╀═════════════╀══════════╀═════════════╕
42
+ β”‚ Slu f1 β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
43
+ β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
44
+ β”‚ OVERALL β”‚ 0.7940 β”‚ 0.7582 β”‚ 0.7757 β”‚
45
+ β•˜β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
46
+