pyf98 commited on
Commit
af98cc0
β€’
1 Parent(s): b09af25
Files changed (20) hide show
  1. README.md +904 -0
  2. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/RESULTS.md +29 -0
  3. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/config.yaml +806 -0
  4. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/acc.png +0 -0
  5. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/backward_time.png +0 -0
  6. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/cer.png +0 -0
  7. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/forward_time.png +0 -0
  9. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/iter_time.png +0 -0
  11. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss.png +0 -0
  12. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss_att.png +0 -0
  13. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss_ctc.png +0 -0
  14. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/optim0_lr0.png +0 -0
  15. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/optim_step_time.png +0 -0
  16. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/train_time.png +0 -0
  17. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/wer.png +0 -0
  18. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/score.log +46 -0
  19. exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/valid.acc.ave_10best.pth +3 -0
  20. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,904 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - slurp_entity
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `pyf98/slurp_entity_branchformer`
15
+
16
+ This model was trained by Yifan Peng using slurp_entity recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 55b6cc387fd0252d1a06db2042fd101bcea7bb34
23
+ pip install -e .
24
+ cd egs2/slurp_entity/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model pyf98/slurp_entity_branchformer
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Fri May 27 03:41:59 EDT 2022`
32
+ - python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]`
33
+ - espnet version: `espnet 202204`
34
+ - pytorch version: `pytorch 1.11.0`
35
+ - Git hash: `4f36236ed7c8a25c2f869e518614e1ad4a8b50d6`
36
+ - Commit date: `Thu May 26 00:22:45 2022 -0400`
37
+
38
+ ## asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|83.7|7.6|8.8|2.8|19.2|50.5|
44
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|82.6|7.9|9.5|2.7|20.1|49.2|
45
+
46
+ ### CER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.1|3.0|6.9|3.3|13.2|50.5|
51
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.0|3.2|7.8|3.1|14.1|49.2|
52
+
53
+ ### TER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+
58
+ ## ASR config
59
+
60
+ <details><summary>expand</summary>
61
+
62
+ ```
63
+ config: conf/tuning/train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k.yaml
64
+ print_config: false
65
+ log_level: INFO
66
+ dry_run: false
67
+ iterator_type: sequence
68
+ output_dir: exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word
69
+ ngpu: 1
70
+ seed: 0
71
+ num_workers: 1
72
+ num_att_plot: 3
73
+ dist_backend: nccl
74
+ dist_init_method: env://
75
+ dist_world_size: null
76
+ dist_rank: null
77
+ local_rank: 0
78
+ dist_master_addr: null
79
+ dist_master_port: null
80
+ dist_launcher: null
81
+ multiprocessing_distributed: false
82
+ unused_parameters: false
83
+ sharded_ddp: false
84
+ cudnn_enabled: true
85
+ cudnn_benchmark: false
86
+ cudnn_deterministic: true
87
+ collect_stats: false
88
+ write_collected_feats: false
89
+ max_epoch: 50
90
+ patience: null
91
+ val_scheduler_criterion:
92
+ - valid
93
+ - loss
94
+ early_stopping_criterion:
95
+ - valid
96
+ - loss
97
+ - min
98
+ best_model_criterion:
99
+ - - valid
100
+ - acc
101
+ - max
102
+ keep_nbest_models: 10
103
+ nbest_averaging_interval: 0
104
+ grad_clip: 5.0
105
+ grad_clip_type: 2.0
106
+ grad_noise: false
107
+ accum_grad: 1
108
+ no_forward_run: false
109
+ resume: true
110
+ train_dtype: float32
111
+ use_amp: false
112
+ log_interval: null
113
+ use_matplotlib: true
114
+ use_tensorboard: true
115
+ use_wandb: false
116
+ wandb_project: null
117
+ wandb_id: null
118
+ wandb_entity: null
119
+ wandb_name: null
120
+ wandb_model_log_interval: -1
121
+ detect_anomaly: false
122
+ pretrain_path: null
123
+ init_param: []
124
+ ignore_init_mismatch: false
125
+ freeze_param: []
126
+ num_iters_per_epoch: null
127
+ batch_size: 64
128
+ valid_batch_size: null
129
+ batch_bins: 1000000
130
+ valid_batch_bins: null
131
+ train_shape_file:
132
+ - exp/asr_stats_raw_en_word/train/speech_shape
133
+ - exp/asr_stats_raw_en_word/train/text_shape.word
134
+ valid_shape_file:
135
+ - exp/asr_stats_raw_en_word/valid/speech_shape
136
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
137
+ batch_type: folded
138
+ valid_batch_type: null
139
+ fold_length:
140
+ - 80000
141
+ - 150
142
+ sort_in_batch: descending
143
+ sort_batch: descending
144
+ multiple_iterator: false
145
+ chunk_length: 500
146
+ chunk_shift_ratio: 0.5
147
+ num_cache_chunks: 1024
148
+ train_data_path_and_name_and_type:
149
+ - - dump/raw/train/wav.scp
150
+ - speech
151
+ - kaldi_ark
152
+ - - dump/raw/train/text
153
+ - text
154
+ - text
155
+ valid_data_path_and_name_and_type:
156
+ - - dump/raw/devel/wav.scp
157
+ - speech
158
+ - kaldi_ark
159
+ - - dump/raw/devel/text
160
+ - text
161
+ - text
162
+ allow_variable_data_keys: false
163
+ max_cache_size: 0.0
164
+ max_cache_fd: 32
165
+ valid_max_cache_size: null
166
+ optim: adam
167
+ optim_conf:
168
+ lr: 0.001
169
+ weight_decay: 1.0e-06
170
+ scheduler: warmuplr
171
+ scheduler_conf:
172
+ warmup_steps: 35000
173
+ token_list:
174
+ - <blank>
175
+ - <unk>
176
+ - ▁SEP
177
+ - ▁FILL
178
+ - s
179
+ - ▁the
180
+ - a
181
+ - ▁to
182
+ - ▁i
183
+ - ▁me
184
+ - e
185
+ - ▁s
186
+ - ▁a
187
+ - i
188
+ - ▁you
189
+ - ▁what
190
+ - er
191
+ - ing
192
+ - u
193
+ - ▁is
194
+ - ''''
195
+ - o
196
+ - p
197
+ - ▁in
198
+ - ▁p
199
+ - y
200
+ - ▁my
201
+ - ▁please
202
+ - d
203
+ - c
204
+ - m
205
+ - ▁b
206
+ - l
207
+ - ▁m
208
+ - ▁c
209
+ - st
210
+ - date
211
+ - n
212
+ - ▁d
213
+ - le
214
+ - b
215
+ - ▁for
216
+ - re
217
+ - t
218
+ - ▁on
219
+ - en
220
+ - h
221
+ - 'on'
222
+ - ar
223
+ - person
224
+ - ▁re
225
+ - ▁f
226
+ - ▁g
227
+ - ▁of
228
+ - an
229
+ - ▁
230
+ - g
231
+ - ▁today
232
+ - ▁t
233
+ - or
234
+ - ▁it
235
+ - ▁this
236
+ - ▁h
237
+ - r
238
+ - f
239
+ - at
240
+ - ch
241
+ - ce
242
+ - place_name
243
+ - ▁email
244
+ - ▁do
245
+ - es
246
+ - ri
247
+ - ▁e
248
+ - ▁w
249
+ - ic
250
+ - in
251
+ - ▁that
252
+ - event_name
253
+ - ▁play
254
+ - ▁and
255
+ - al
256
+ - ▁n
257
+ - ▁can
258
+ - email_query
259
+ - ve
260
+ - ▁new
261
+ - day
262
+ - it
263
+ - ate
264
+ - ▁from
265
+ - ▁have
266
+ - k
267
+ - time
268
+ - ▁am
269
+ - media_type
270
+ - email_sendemail
271
+ - ent
272
+ - ▁olly
273
+ - qa_factoid
274
+ - se
275
+ - v
276
+ - et
277
+ - ck
278
+ - ▁any
279
+ - calendar_set
280
+ - ly
281
+ - th
282
+ - ▁how
283
+ - ▁meeting
284
+ - ed
285
+ - ▁tell
286
+ - ▁st
287
+ - x
288
+ - ur
289
+ - ro
290
+ - ▁at
291
+ - nd
292
+ - ▁list
293
+ - w
294
+ - ▁u
295
+ - ou
296
+ - ▁not
297
+ - ▁about
298
+ - ▁an
299
+ - ▁o
300
+ - general_negate
301
+ - ut
302
+ - ▁time
303
+ - ▁be
304
+ - ▁ch
305
+ - ▁are
306
+ - social_post
307
+ - business_name
308
+ - la
309
+ - ty
310
+ - play_music
311
+ - ot
312
+ - general_quirky
313
+ - ▁l
314
+ - ▁sh
315
+ - ▁tweet
316
+ - om
317
+ - ▁week
318
+ - um
319
+ - ▁one
320
+ - ter
321
+ - ▁he
322
+ - ▁up
323
+ - ▁com
324
+ - general_praise
325
+ - weather_query
326
+ - ▁next
327
+ - ▁th
328
+ - ▁check
329
+ - calendar_query
330
+ - ▁last
331
+ - ▁ro
332
+ - ad
333
+ - is
334
+ - ▁with
335
+ - ay
336
+ - ▁send
337
+ - pe
338
+ - ▁pm
339
+ - ▁tomorrow
340
+ - ▁j
341
+ - un
342
+ - ▁train
343
+ - general_explain
344
+ - ▁v
345
+ - one
346
+ - ▁r
347
+ - ra
348
+ - news_query
349
+ - ation
350
+ - ▁emails
351
+ - us
352
+ - if
353
+ - ct
354
+ - ▁co
355
+ - ▁add
356
+ - ▁will
357
+ - ▁se
358
+ - nt
359
+ - ▁was
360
+ - ine
361
+ - ▁de
362
+ - ▁set
363
+ - ▁ex
364
+ - ▁would
365
+ - ir
366
+ - ow
367
+ - ber
368
+ - general_repeat
369
+ - ight
370
+ - ook
371
+ - ▁again
372
+ - ▁song
373
+ - currency_name
374
+ - ll
375
+ - ▁ha
376
+ - ▁go
377
+ - relation
378
+ - te
379
+ - ion
380
+ - and
381
+ - ▁y
382
+ - ▁ye
383
+ - general_affirm
384
+ - general_confirm
385
+ - ery
386
+ - ▁po
387
+ - ff
388
+ - ▁we
389
+ - ▁turn
390
+ - ▁did
391
+ - ▁mar
392
+ - ▁alarm
393
+ - ▁like
394
+ - datetime_query
395
+ - ers
396
+ - ▁all
397
+ - ▁remind
398
+ - ▁so
399
+ - qa_definition
400
+ - ▁calendar
401
+ - end
402
+ - ▁said
403
+ - ci
404
+ - ▁off
405
+ - ▁john
406
+ - ▁day
407
+ - ss
408
+ - pla
409
+ - ume
410
+ - ▁get
411
+ - ail
412
+ - pp
413
+ - z
414
+ - ry
415
+ - am
416
+ - ▁need
417
+ - as
418
+ - ▁thank
419
+ - ▁wh
420
+ - ▁want
421
+ - ▁right
422
+ - ▁jo
423
+ - ▁facebook
424
+ - ▁k
425
+ - ge
426
+ - ld
427
+ - ▁fri
428
+ - ▁two
429
+ - general_dontcare
430
+ - ▁news
431
+ - ol
432
+ - oo
433
+ - ant
434
+ - ▁five
435
+ - ▁event
436
+ - ake
437
+ - definition_word
438
+ - transport_type
439
+ - ▁your
440
+ - vi
441
+ - orn
442
+ - op
443
+ - ▁weather
444
+ - ome
445
+ - ▁app
446
+ - ▁lo
447
+ - de
448
+ - ▁music
449
+ - weather_descriptor
450
+ - ak
451
+ - ke
452
+ - ▁there
453
+ - ▁si
454
+ - ▁lights
455
+ - ▁now
456
+ - ▁mo
457
+ - calendar_remove
458
+ - our
459
+ - ▁dollar
460
+ - food_type
461
+ - me
462
+ - ▁more
463
+ - ▁no
464
+ - ▁birthday
465
+ - orrect
466
+ - ▁rep
467
+ - ▁show
468
+ - play_radio
469
+ - ▁mon
470
+ - ▁does
471
+ - ood
472
+ - ag
473
+ - li
474
+ - ▁sto
475
+ - ▁contact
476
+ - cket
477
+ - email_querycontact
478
+ - ▁ev
479
+ - ▁could
480
+ - ange
481
+ - ▁just
482
+ - out
483
+ - ame
484
+ - .
485
+ - ▁ja
486
+ - ▁confirm
487
+ - qa_currency
488
+ - ▁man
489
+ - ▁late
490
+ - ▁think
491
+ - ▁some
492
+ - timeofday
493
+ - ▁bo
494
+ - qa_stock
495
+ - ong
496
+ - ▁start
497
+ - ▁work
498
+ - ▁ten
499
+ - int
500
+ - ▁command
501
+ - all
502
+ - ▁make
503
+ - ▁la
504
+ - j
505
+ - ▁answ
506
+ - ▁hour
507
+ - ▁cle
508
+ - ah
509
+ - ▁find
510
+ - ▁service
511
+ - ▁fa
512
+ - qu
513
+ - general_commandstop
514
+ - ai
515
+ - ▁when
516
+ - ▁te
517
+ - ▁by
518
+ - social_query
519
+ - ard
520
+ - ▁tw
521
+ - ul
522
+ - id
523
+ - ▁seven
524
+ - ▁where
525
+ - ▁much
526
+ - art
527
+ - ▁appointment
528
+ - ver
529
+ - artist_name
530
+ - el
531
+ - device_type
532
+ - ▁know
533
+ - ▁three
534
+ - ▁events
535
+ - ▁tr
536
+ - ▁li
537
+ - ork
538
+ - red
539
+ - ect
540
+ - ▁let
541
+ - ▁respon
542
+ - ▁par
543
+ - zz
544
+ - ▁give
545
+ - ▁twenty
546
+ - ▁ti
547
+ - ▁curre
548
+ - play_podcasts
549
+ - ▁radio
550
+ - cooking_recipe
551
+ - transport_query
552
+ - ▁con
553
+ - gh
554
+ - ▁le
555
+ - lists_query
556
+ - ▁rem
557
+ - recommendation_events
558
+ - house_place
559
+ - alarm_set
560
+ - play_audiobook
561
+ - ist
562
+ - ase
563
+ - music_genre
564
+ - ive
565
+ - ast
566
+ - player_setting
567
+ - ort
568
+ - lly
569
+ - news_topic
570
+ - list_name
571
+ - ▁playlist
572
+ - ▁ne
573
+ - business_type
574
+ - personal_info
575
+ - ind
576
+ - ust
577
+ - di
578
+ - ress
579
+ - recommendation_locations
580
+ - lists_createoradd
581
+ - iot_hue_lightoff
582
+ - lists_remove
583
+ - ord
584
+ - ▁light
585
+ - ere
586
+ - alarm_query
587
+ - audio_volume_mute
588
+ - music_query
589
+ - ▁audio
590
+ - rain
591
+ - ▁date
592
+ - ▁order
593
+ - audio_volume_up
594
+ - ▁ar
595
+ - ▁podcast
596
+ - transport_ticket
597
+ - mail
598
+ - iot_hue_lightchange
599
+ - iot_coffee
600
+ - radio_name
601
+ - ill
602
+ - ▁ri
603
+ - '@'
604
+ - takeaway_query
605
+ - song_name
606
+ - takeaway_order
607
+ - ▁ra
608
+ - email_addcontact
609
+ - play_game
610
+ - book
611
+ - transport_traffic
612
+ - ▁house
613
+ - music_likeness
614
+ - her
615
+ - transport_taxi
616
+ - iot_hue_lightdim
617
+ - ment
618
+ - ght
619
+ - fo
620
+ - order_type
621
+ - color_type
622
+ - '1'
623
+ - ven
624
+ - ould
625
+ - general_joke
626
+ - ess
627
+ - ain
628
+ - qa_maths
629
+ - ▁place
630
+ - ▁twe
631
+ - cast
632
+ - iot_cleaning
633
+ - ▁che
634
+ - ▁cont
635
+ - ith
636
+ - audiobook_name
637
+ - email_address
638
+ - game_name
639
+ - ▁cal
640
+ - general_frequency
641
+ - ▁tom
642
+ - ▁food
643
+ - act
644
+ - iot_hue_lightup
645
+ - '2'
646
+ - alarm_remove
647
+ - podcast_descriptor
648
+ - ▁definition
649
+ - audio_volume_down
650
+ - ▁media
651
+ - email_folder
652
+ - dia
653
+ - meal_type
654
+ - ▁mus
655
+ - recommendation_movies
656
+ - ▁ad
657
+ - ree
658
+ - pt
659
+ - now
660
+ - playlist_name
661
+ - ▁person
662
+ - change_amount
663
+ - ▁pla
664
+ - escri
665
+ - datetime_convert
666
+ - podcast_name
667
+ - ▁ab
668
+ - time_zone
669
+ - ▁def
670
+ - ting
671
+ - iot_wemo_on
672
+ - music_settings
673
+ - iot_wemo_off
674
+ - orre
675
+ - cy
676
+ - ank
677
+ - music_descriptor
678
+ - lar
679
+ - app_name
680
+ - row
681
+ - joke_type
682
+ - xt
683
+ - of
684
+ - ition
685
+ - ▁meet
686
+ - ink
687
+ - ▁confir
688
+ - transport_agency
689
+ - general_greet
690
+ - ▁business
691
+ - ▁art
692
+ - ▁ag
693
+ - urn
694
+ - escript
695
+ - rom
696
+ - ▁rel
697
+ - ▁au
698
+ - ▁currency
699
+ - audio_volume_other
700
+ - iot_hue_lighton
701
+ - ▁artist
702
+ - '?'
703
+ - ▁bus
704
+ - cooking_type
705
+ - movie_name
706
+ - coffee_type
707
+ - ingredient
708
+ - ather
709
+ - music_dislikeness
710
+ - sp
711
+ - q
712
+ - ▁ser
713
+ - esc
714
+ - ▁bir
715
+ - ▁cur
716
+ - name
717
+ - ▁tran
718
+ - ▁hou
719
+ - ek
720
+ - uch
721
+ - ▁conf
722
+ - ▁face
723
+ - '9'
724
+ - ▁birth
725
+ - I
726
+ - sw
727
+ - transport_descriptor
728
+ - ▁comm
729
+ - lease
730
+ - transport_name
731
+ - aid
732
+ - movie_type
733
+ - ▁device
734
+ - alarm_type
735
+ - audiobook_author
736
+ - '5'
737
+ - drink_type
738
+ - ▁joh
739
+ - ▁defin
740
+ - word
741
+ - ▁curren
742
+ - order
743
+ - iness
744
+ - W
745
+ - cooking_query
746
+ - sport_type
747
+ - ▁relation
748
+ - oint
749
+ - H
750
+ - '8'
751
+ - A
752
+ - '0'
753
+ - ▁dol
754
+ - vice
755
+ - ▁pers
756
+ - '&'
757
+ - T
758
+ - ▁appoint
759
+ - _
760
+ - '7'
761
+ - '3'
762
+ - '-'
763
+ - game_type
764
+ - ▁pod
765
+ - N
766
+ - M
767
+ - E
768
+ - list
769
+ - music_album
770
+ - dio
771
+ - ▁transport
772
+ - qa_query
773
+ - C
774
+ - O
775
+ - U
776
+ - query_detail
777
+ - ']'
778
+ - '['
779
+ - descriptor
780
+ - ':'
781
+ - spon
782
+ - <sos/eos>
783
+ init: null
784
+ input_size: null
785
+ ctc_conf:
786
+ dropout_rate: 0.0
787
+ ctc_type: builtin
788
+ reduce: true
789
+ ignore_nan_grad: true
790
+ joint_net_conf: null
791
+ use_preprocessor: true
792
+ token_type: word
793
+ bpemodel: null
794
+ non_linguistic_symbols: null
795
+ cleaner: null
796
+ g2p: null
797
+ speech_volume_normalize: null
798
+ rir_scp: null
799
+ rir_apply_prob: 1.0
800
+ noise_scp: null
801
+ noise_apply_prob: 1.0
802
+ noise_db_range: '13_15'
803
+ frontend: default
804
+ frontend_conf:
805
+ fs: 16k
806
+ specaug: specaug
807
+ specaug_conf:
808
+ apply_time_warp: true
809
+ time_warp_window: 5
810
+ time_warp_mode: bicubic
811
+ apply_freq_mask: true
812
+ freq_mask_width_range:
813
+ - 0
814
+ - 30
815
+ num_freq_mask: 2
816
+ apply_time_mask: true
817
+ time_mask_width_range:
818
+ - 0
819
+ - 40
820
+ num_time_mask: 2
821
+ normalize: utterance_mvn
822
+ normalize_conf: {}
823
+ model: espnet
824
+ model_conf:
825
+ ctc_weight: 0.3
826
+ lsm_weight: 0.1
827
+ length_normalized_loss: false
828
+ extract_feats_in_collect_stats: false
829
+ preencoder: null
830
+ preencoder_conf: {}
831
+ encoder: branchformer
832
+ encoder_conf:
833
+ output_size: 512
834
+ use_attn: true
835
+ attention_heads: 8
836
+ attention_layer_type: rel_selfattn
837
+ pos_enc_layer_type: rel_pos
838
+ rel_pos_type: latest
839
+ use_cgmlp: true
840
+ cgmlp_linear_units: 2048
841
+ cgmlp_conv_kernel: 31
842
+ use_linear_after_conv: false
843
+ gate_activation: identity
844
+ merge_method: concat
845
+ cgmlp_weight: 0.5
846
+ attn_branch_drop_rate: 0.0
847
+ num_blocks: 18
848
+ dropout_rate: 0.1
849
+ positional_dropout_rate: 0.1
850
+ attention_dropout_rate: 0.1
851
+ input_layer: conv2d
852
+ stochastic_depth_rate: 0.0
853
+ postencoder: null
854
+ postencoder_conf: {}
855
+ decoder: transformer
856
+ decoder_conf:
857
+ attention_heads: 8
858
+ linear_units: 2048
859
+ num_blocks: 6
860
+ dropout_rate: 0.1
861
+ positional_dropout_rate: 0.1
862
+ self_attention_dropout_rate: 0.1
863
+ src_attention_dropout_rate: 0.1
864
+ required:
865
+ - output_dir
866
+ - token_list
867
+ version: '202204'
868
+ distributed: false
869
+ ```
870
+
871
+ </details>
872
+
873
+
874
+
875
+ ### Citing ESPnet
876
+
877
+ ```BibTex
878
+ @inproceedings{watanabe2018espnet,
879
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
880
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
881
+ year={2018},
882
+ booktitle={Proceedings of Interspeech},
883
+ pages={2207--2211},
884
+ doi={10.21437/Interspeech.2018-1456},
885
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
886
+ }
887
+
888
+
889
+
890
+
891
+ ```
892
+
893
+ or arXiv:
894
+
895
+ ```bibtex
896
+ @misc{watanabe2018espnet,
897
+ title={ESPnet: End-to-End Speech Processing Toolkit},
898
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
899
+ year={2018},
900
+ eprint={1804.00015},
901
+ archivePrefix={arXiv},
902
+ primaryClass={cs.CL}
903
+ }
904
+ ```
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Fri May 27 03:41:59 EDT 2022`
5
+ - python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]`
6
+ - espnet version: `espnet 202204`
7
+ - pytorch version: `pytorch 1.11.0`
8
+ - Git hash: `4f36236ed7c8a25c2f869e518614e1ad4a8b50d6`
9
+ - Commit date: `Thu May 26 00:22:45 2022 -0400`
10
+
11
+ ## asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|83.7|7.6|8.8|2.8|19.2|50.5|
17
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|82.6|7.9|9.5|2.7|20.1|49.2|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.1|3.0|6.9|3.3|13.2|50.5|
24
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.0|3.2|7.8|3.1|14.1|49.2|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/config.yaml ADDED
@@ -0,0 +1,806 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 50
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ wandb_entity: null
57
+ wandb_name: null
58
+ wandb_model_log_interval: -1
59
+ detect_anomaly: false
60
+ pretrain_path: null
61
+ init_param: []
62
+ ignore_init_mismatch: false
63
+ freeze_param: []
64
+ num_iters_per_epoch: null
65
+ batch_size: 64
66
+ valid_batch_size: null
67
+ batch_bins: 1000000
68
+ valid_batch_bins: null
69
+ train_shape_file:
70
+ - exp/asr_stats_raw_en_word/train/speech_shape
71
+ - exp/asr_stats_raw_en_word/train/text_shape.word
72
+ valid_shape_file:
73
+ - exp/asr_stats_raw_en_word/valid/speech_shape
74
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
75
+ batch_type: folded
76
+ valid_batch_type: null
77
+ fold_length:
78
+ - 80000
79
+ - 150
80
+ sort_in_batch: descending
81
+ sort_batch: descending
82
+ multiple_iterator: false
83
+ chunk_length: 500
84
+ chunk_shift_ratio: 0.5
85
+ num_cache_chunks: 1024
86
+ train_data_path_and_name_and_type:
87
+ - - dump/raw/train/wav.scp
88
+ - speech
89
+ - kaldi_ark
90
+ - - dump/raw/train/text
91
+ - text
92
+ - text
93
+ valid_data_path_and_name_and_type:
94
+ - - dump/raw/devel/wav.scp
95
+ - speech
96
+ - kaldi_ark
97
+ - - dump/raw/devel/text
98
+ - text
99
+ - text
100
+ allow_variable_data_keys: false
101
+ max_cache_size: 0.0
102
+ max_cache_fd: 32
103
+ valid_max_cache_size: null
104
+ optim: adam
105
+ optim_conf:
106
+ lr: 0.001
107
+ weight_decay: 1.0e-06
108
+ scheduler: warmuplr
109
+ scheduler_conf:
110
+ warmup_steps: 35000
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - ▁SEP
115
+ - ▁FILL
116
+ - s
117
+ - ▁the
118
+ - a
119
+ - ▁to
120
+ - ▁i
121
+ - ▁me
122
+ - e
123
+ - ▁s
124
+ - ▁a
125
+ - i
126
+ - ▁you
127
+ - ▁what
128
+ - er
129
+ - ing
130
+ - u
131
+ - ▁is
132
+ - ''''
133
+ - o
134
+ - p
135
+ - ▁in
136
+ - ▁p
137
+ - y
138
+ - ▁my
139
+ - ▁please
140
+ - d
141
+ - c
142
+ - m
143
+ - ▁b
144
+ - l
145
+ - ▁m
146
+ - ▁c
147
+ - st
148
+ - date
149
+ - n
150
+ - ▁d
151
+ - le
152
+ - b
153
+ - ▁for
154
+ - re
155
+ - t
156
+ - ▁on
157
+ - en
158
+ - h
159
+ - 'on'
160
+ - ar
161
+ - person
162
+ - ▁re
163
+ - ▁f
164
+ - ▁g
165
+ - ▁of
166
+ - an
167
+ - ▁
168
+ - g
169
+ - ▁today
170
+ - ▁t
171
+ - or
172
+ - ▁it
173
+ - ▁this
174
+ - ▁h
175
+ - r
176
+ - f
177
+ - at
178
+ - ch
179
+ - ce
180
+ - place_name
181
+ - ▁email
182
+ - ▁do
183
+ - es
184
+ - ri
185
+ - ▁e
186
+ - ▁w
187
+ - ic
188
+ - in
189
+ - ▁that
190
+ - event_name
191
+ - ▁play
192
+ - ▁and
193
+ - al
194
+ - ▁n
195
+ - ▁can
196
+ - email_query
197
+ - ve
198
+ - ▁new
199
+ - day
200
+ - it
201
+ - ate
202
+ - ▁from
203
+ - ▁have
204
+ - k
205
+ - time
206
+ - ▁am
207
+ - media_type
208
+ - email_sendemail
209
+ - ent
210
+ - ▁olly
211
+ - qa_factoid
212
+ - se
213
+ - v
214
+ - et
215
+ - ck
216
+ - ▁any
217
+ - calendar_set
218
+ - ly
219
+ - th
220
+ - ▁how
221
+ - ▁meeting
222
+ - ed
223
+ - ▁tell
224
+ - ▁st
225
+ - x
226
+ - ur
227
+ - ro
228
+ - ▁at
229
+ - nd
230
+ - ▁list
231
+ - w
232
+ - ▁u
233
+ - ou
234
+ - ▁not
235
+ - ▁about
236
+ - ▁an
237
+ - ▁o
238
+ - general_negate
239
+ - ut
240
+ - ▁time
241
+ - ▁be
242
+ - ▁ch
243
+ - ▁are
244
+ - social_post
245
+ - business_name
246
+ - la
247
+ - ty
248
+ - play_music
249
+ - ot
250
+ - general_quirky
251
+ - ▁l
252
+ - ▁sh
253
+ - ▁tweet
254
+ - om
255
+ - ▁week
256
+ - um
257
+ - ▁one
258
+ - ter
259
+ - ▁he
260
+ - ▁up
261
+ - ▁com
262
+ - general_praise
263
+ - weather_query
264
+ - ▁next
265
+ - ▁th
266
+ - ▁check
267
+ - calendar_query
268
+ - ▁last
269
+ - ▁ro
270
+ - ad
271
+ - is
272
+ - ▁with
273
+ - ay
274
+ - ▁send
275
+ - pe
276
+ - ▁pm
277
+ - ▁tomorrow
278
+ - ▁j
279
+ - un
280
+ - ▁train
281
+ - general_explain
282
+ - ▁v
283
+ - one
284
+ - ▁r
285
+ - ra
286
+ - news_query
287
+ - ation
288
+ - ▁emails
289
+ - us
290
+ - if
291
+ - ct
292
+ - ▁co
293
+ - ▁add
294
+ - ▁will
295
+ - ▁se
296
+ - nt
297
+ - ▁was
298
+ - ine
299
+ - ▁de
300
+ - ▁set
301
+ - ▁ex
302
+ - ▁would
303
+ - ir
304
+ - ow
305
+ - ber
306
+ - general_repeat
307
+ - ight
308
+ - ook
309
+ - ▁again
310
+ - ▁song
311
+ - currency_name
312
+ - ll
313
+ - ▁ha
314
+ - ▁go
315
+ - relation
316
+ - te
317
+ - ion
318
+ - and
319
+ - ▁y
320
+ - ▁ye
321
+ - general_affirm
322
+ - general_confirm
323
+ - ery
324
+ - ▁po
325
+ - ff
326
+ - ▁we
327
+ - ▁turn
328
+ - ▁did
329
+ - ▁mar
330
+ - ▁alarm
331
+ - ▁like
332
+ - datetime_query
333
+ - ers
334
+ - ▁all
335
+ - ▁remind
336
+ - ▁so
337
+ - qa_definition
338
+ - ▁calendar
339
+ - end
340
+ - ▁said
341
+ - ci
342
+ - ▁off
343
+ - ▁john
344
+ - ▁day
345
+ - ss
346
+ - pla
347
+ - ume
348
+ - ▁get
349
+ - ail
350
+ - pp
351
+ - z
352
+ - ry
353
+ - am
354
+ - ▁need
355
+ - as
356
+ - ▁thank
357
+ - ▁wh
358
+ - ▁want
359
+ - ▁right
360
+ - ▁jo
361
+ - ▁facebook
362
+ - ▁k
363
+ - ge
364
+ - ld
365
+ - ▁fri
366
+ - ▁two
367
+ - general_dontcare
368
+ - ▁news
369
+ - ol
370
+ - oo
371
+ - ant
372
+ - ▁five
373
+ - ▁event
374
+ - ake
375
+ - definition_word
376
+ - transport_type
377
+ - ▁your
378
+ - vi
379
+ - orn
380
+ - op
381
+ - ▁weather
382
+ - ome
383
+ - ▁app
384
+ - ▁lo
385
+ - de
386
+ - ▁music
387
+ - weather_descriptor
388
+ - ak
389
+ - ke
390
+ - ▁there
391
+ - ▁si
392
+ - ▁lights
393
+ - ▁now
394
+ - ▁mo
395
+ - calendar_remove
396
+ - our
397
+ - ▁dollar
398
+ - food_type
399
+ - me
400
+ - ▁more
401
+ - ▁no
402
+ - ▁birthday
403
+ - orrect
404
+ - ▁rep
405
+ - ▁show
406
+ - play_radio
407
+ - ▁mon
408
+ - ▁does
409
+ - ood
410
+ - ag
411
+ - li
412
+ - ▁sto
413
+ - ▁contact
414
+ - cket
415
+ - email_querycontact
416
+ - ▁ev
417
+ - ▁could
418
+ - ange
419
+ - ▁just
420
+ - out
421
+ - ame
422
+ - .
423
+ - ▁ja
424
+ - ▁confirm
425
+ - qa_currency
426
+ - ▁man
427
+ - ▁late
428
+ - ▁think
429
+ - ▁some
430
+ - timeofday
431
+ - ▁bo
432
+ - qa_stock
433
+ - ong
434
+ - ▁start
435
+ - ▁work
436
+ - ▁ten
437
+ - int
438
+ - ▁command
439
+ - all
440
+ - ▁make
441
+ - ▁la
442
+ - j
443
+ - ▁answ
444
+ - ▁hour
445
+ - ▁cle
446
+ - ah
447
+ - ▁find
448
+ - ▁service
449
+ - ▁fa
450
+ - qu
451
+ - general_commandstop
452
+ - ai
453
+ - ▁when
454
+ - ▁te
455
+ - ▁by
456
+ - social_query
457
+ - ard
458
+ - ▁tw
459
+ - ul
460
+ - id
461
+ - ▁seven
462
+ - ▁where
463
+ - ▁much
464
+ - art
465
+ - ▁appointment
466
+ - ver
467
+ - artist_name
468
+ - el
469
+ - device_type
470
+ - ▁know
471
+ - ▁three
472
+ - ▁events
473
+ - ▁tr
474
+ - ▁li
475
+ - ork
476
+ - red
477
+ - ect
478
+ - ▁let
479
+ - ▁respon
480
+ - ▁par
481
+ - zz
482
+ - ▁give
483
+ - ▁twenty
484
+ - ▁ti
485
+ - ▁curre
486
+ - play_podcasts
487
+ - ▁radio
488
+ - cooking_recipe
489
+ - transport_query
490
+ - ▁con
491
+ - gh
492
+ - ▁le
493
+ - lists_query
494
+ - ▁rem
495
+ - recommendation_events
496
+ - house_place
497
+ - alarm_set
498
+ - play_audiobook
499
+ - ist
500
+ - ase
501
+ - music_genre
502
+ - ive
503
+ - ast
504
+ - player_setting
505
+ - ort
506
+ - lly
507
+ - news_topic
508
+ - list_name
509
+ - ▁playlist
510
+ - ▁ne
511
+ - business_type
512
+ - personal_info
513
+ - ind
514
+ - ust
515
+ - di
516
+ - ress
517
+ - recommendation_locations
518
+ - lists_createoradd
519
+ - iot_hue_lightoff
520
+ - lists_remove
521
+ - ord
522
+ - ▁light
523
+ - ere
524
+ - alarm_query
525
+ - audio_volume_mute
526
+ - music_query
527
+ - ▁audio
528
+ - rain
529
+ - ▁date
530
+ - ▁order
531
+ - audio_volume_up
532
+ - ▁ar
533
+ - ▁podcast
534
+ - transport_ticket
535
+ - mail
536
+ - iot_hue_lightchange
537
+ - iot_coffee
538
+ - radio_name
539
+ - ill
540
+ - ▁ri
541
+ - '@'
542
+ - takeaway_query
543
+ - song_name
544
+ - takeaway_order
545
+ - ▁ra
546
+ - email_addcontact
547
+ - play_game
548
+ - book
549
+ - transport_traffic
550
+ - ▁house
551
+ - music_likeness
552
+ - her
553
+ - transport_taxi
554
+ - iot_hue_lightdim
555
+ - ment
556
+ - ght
557
+ - fo
558
+ - order_type
559
+ - color_type
560
+ - '1'
561
+ - ven
562
+ - ould
563
+ - general_joke
564
+ - ess
565
+ - ain
566
+ - qa_maths
567
+ - ▁place
568
+ - ▁twe
569
+ - cast
570
+ - iot_cleaning
571
+ - ▁che
572
+ - ▁cont
573
+ - ith
574
+ - audiobook_name
575
+ - email_address
576
+ - game_name
577
+ - ▁cal
578
+ - general_frequency
579
+ - ▁tom
580
+ - ▁food
581
+ - act
582
+ - iot_hue_lightup
583
+ - '2'
584
+ - alarm_remove
585
+ - podcast_descriptor
586
+ - ▁definition
587
+ - audio_volume_down
588
+ - ▁media
589
+ - email_folder
590
+ - dia
591
+ - meal_type
592
+ - ▁mus
593
+ - recommendation_movies
594
+ - ▁ad
595
+ - ree
596
+ - pt
597
+ - now
598
+ - playlist_name
599
+ - ▁person
600
+ - change_amount
601
+ - ▁pla
602
+ - escri
603
+ - datetime_convert
604
+ - podcast_name
605
+ - ▁ab
606
+ - time_zone
607
+ - ▁def
608
+ - ting
609
+ - iot_wemo_on
610
+ - music_settings
611
+ - iot_wemo_off
612
+ - orre
613
+ - cy
614
+ - ank
615
+ - music_descriptor
616
+ - lar
617
+ - app_name
618
+ - row
619
+ - joke_type
620
+ - xt
621
+ - of
622
+ - ition
623
+ - ▁meet
624
+ - ink
625
+ - ▁confir
626
+ - transport_agency
627
+ - general_greet
628
+ - ▁business
629
+ - ▁art
630
+ - ▁ag
631
+ - urn
632
+ - escript
633
+ - rom
634
+ - ▁rel
635
+ - ▁au
636
+ - ▁currency
637
+ - audio_volume_other
638
+ - iot_hue_lighton
639
+ - ▁artist
640
+ - '?'
641
+ - ▁bus
642
+ - cooking_type
643
+ - movie_name
644
+ - coffee_type
645
+ - ingredient
646
+ - ather
647
+ - music_dislikeness
648
+ - sp
649
+ - q
650
+ - ▁ser
651
+ - esc
652
+ - ▁bir
653
+ - ▁cur
654
+ - name
655
+ - ▁tran
656
+ - ▁hou
657
+ - ek
658
+ - uch
659
+ - ▁conf
660
+ - ▁face
661
+ - '9'
662
+ - ▁birth
663
+ - I
664
+ - sw
665
+ - transport_descriptor
666
+ - ▁comm
667
+ - lease
668
+ - transport_name
669
+ - aid
670
+ - movie_type
671
+ - ▁device
672
+ - alarm_type
673
+ - audiobook_author
674
+ - '5'
675
+ - drink_type
676
+ - ▁joh
677
+ - ▁defin
678
+ - word
679
+ - ▁curren
680
+ - order
681
+ - iness
682
+ - W
683
+ - cooking_query
684
+ - sport_type
685
+ - ▁relation
686
+ - oint
687
+ - H
688
+ - '8'
689
+ - A
690
+ - '0'
691
+ - ▁dol
692
+ - vice
693
+ - ▁pers
694
+ - '&'
695
+ - T
696
+ - ▁appoint
697
+ - _
698
+ - '7'
699
+ - '3'
700
+ - '-'
701
+ - game_type
702
+ - ▁pod
703
+ - N
704
+ - M
705
+ - E
706
+ - list
707
+ - music_album
708
+ - dio
709
+ - ▁transport
710
+ - qa_query
711
+ - C
712
+ - O
713
+ - U
714
+ - query_detail
715
+ - ']'
716
+ - '['
717
+ - descriptor
718
+ - ':'
719
+ - spon
720
+ - <sos/eos>
721
+ init: null
722
+ input_size: null
723
+ ctc_conf:
724
+ dropout_rate: 0.0
725
+ ctc_type: builtin
726
+ reduce: true
727
+ ignore_nan_grad: true
728
+ joint_net_conf: null
729
+ use_preprocessor: true
730
+ token_type: word
731
+ bpemodel: null
732
+ non_linguistic_symbols: null
733
+ cleaner: null
734
+ g2p: null
735
+ speech_volume_normalize: null
736
+ rir_scp: null
737
+ rir_apply_prob: 1.0
738
+ noise_scp: null
739
+ noise_apply_prob: 1.0
740
+ noise_db_range: '13_15'
741
+ frontend: default
742
+ frontend_conf:
743
+ fs: 16k
744
+ specaug: specaug
745
+ specaug_conf:
746
+ apply_time_warp: true
747
+ time_warp_window: 5
748
+ time_warp_mode: bicubic
749
+ apply_freq_mask: true
750
+ freq_mask_width_range:
751
+ - 0
752
+ - 30
753
+ num_freq_mask: 2
754
+ apply_time_mask: true
755
+ time_mask_width_range:
756
+ - 0
757
+ - 40
758
+ num_time_mask: 2
759
+ normalize: utterance_mvn
760
+ normalize_conf: {}
761
+ model: espnet
762
+ model_conf:
763
+ ctc_weight: 0.3
764
+ lsm_weight: 0.1
765
+ length_normalized_loss: false
766
+ extract_feats_in_collect_stats: false
767
+ preencoder: null
768
+ preencoder_conf: {}
769
+ encoder: branchformer
770
+ encoder_conf:
771
+ output_size: 512
772
+ use_attn: true
773
+ attention_heads: 8
774
+ attention_layer_type: rel_selfattn
775
+ pos_enc_layer_type: rel_pos
776
+ rel_pos_type: latest
777
+ use_cgmlp: true
778
+ cgmlp_linear_units: 2048
779
+ cgmlp_conv_kernel: 31
780
+ use_linear_after_conv: false
781
+ gate_activation: identity
782
+ merge_method: concat
783
+ cgmlp_weight: 0.5
784
+ attn_branch_drop_rate: 0.0
785
+ num_blocks: 18
786
+ dropout_rate: 0.1
787
+ positional_dropout_rate: 0.1
788
+ attention_dropout_rate: 0.1
789
+ input_layer: conv2d
790
+ stochastic_depth_rate: 0.0
791
+ postencoder: null
792
+ postencoder_conf: {}
793
+ decoder: transformer
794
+ decoder_conf:
795
+ attention_heads: 8
796
+ linear_units: 2048
797
+ num_blocks: 6
798
+ dropout_rate: 0.1
799
+ positional_dropout_rate: 0.1
800
+ self_attention_dropout_rate: 0.1
801
+ src_attention_dropout_rate: 0.1
802
+ required:
803
+ - output_dir
804
+ - token_list
805
+ version: '202204'
806
+ distributed: false
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/acc.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/backward_time.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/cer.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/cer_ctc.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/forward_time.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/iter_time.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss_att.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/loss_ctc.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/optim0_lr0.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/optim_step_time.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/train_time.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/images/wer.png ADDED
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/score.log ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Valid Intent Classification Result
2
+ 0.8727272727272728
3
+ Test Intent Classification Result
4
+ 0.8653463832390274
5
+ ╒════════════╀═════════════╀══════════╀═════════════╕
6
+ β”‚ Scenario β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
7
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
8
+ β”‚ OVERALL β”‚ 0.9048 β”‚ 0.9048 β”‚ 0.9048 β”‚
9
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
10
+
11
+ ╒══════════╀═════════════╀══════════╀═════════════╕
12
+ β”‚ Action β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
13
+ β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
14
+ β”‚ OVERALL β”‚ 0.8761 β”‚ 0.8761 β”‚ 0.8761 β”‚
15
+ β•˜β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
16
+
17
+ ╒═════════════════════╀═════════════╀══════════╀═════════════╕
18
+ β”‚ Intent (scen_act) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
19
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
20
+ β”‚ OVERALL β”‚ 0.8653 β”‚ 0.8653 β”‚ 0.8653 β”‚
21
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
22
+
23
+ ╒════════════╀═════════════╀══════════╀═════════════╕
24
+ β”‚ Entities β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
25
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
26
+ β”‚ OVERALL β”‚ 0.7419 β”‚ 0.7007 β”‚ 0.7207 β”‚
27
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
28
+
29
+ ╒════════════════════════════╀═════════════╀══════════╀═════════════╕
30
+ β”‚ Entities (distance word) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
31
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
32
+ β”‚ OVERALL β”‚ 0.7805 β”‚ 0.7414 β”‚ 0.7604 β”‚
33
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
34
+
35
+ ╒════════════════════════════╀═════════════╀══════════╀═════════════╕
36
+ β”‚ Entities (distance char) β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
37
+ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
38
+ β”‚ OVERALL β”‚ 0.8146 β”‚ 0.7721 β”‚ 0.7928 β”‚
39
+ β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
40
+
41
+ ╒══════════╀═════════════╀══════════╀═════════════╕
42
+ β”‚ Slu f1 β”‚ Precision β”‚ Recall β”‚ F-Measure β”‚
43
+ β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ═════════════β•ͺ══════════β•ͺ═════════════║
44
+ β”‚ OVERALL β”‚ 0.7972 β”‚ 0.7564 β”‚ 0.7763 β”‚
45
+ β•˜β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•β•β•›
46
+
exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb1c501888c63379c68912c3f3c25185122a01aa7d12fd4f050dc003fe89e6c1
3
+ size 382894173
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ espnet: '202204'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/valid.acc.ave_10best.pth
4
+ python: "3.9.12 (main, Apr 5 2022, 06:56:58) \n[GCC 7.5.0]"
5
+ timestamp: 1653637321.587523
6
+ torch: 1.11.0
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_branchformer_e18_d6_size512_lr1e-3_warmup35k_raw_en_word/config.yaml