Siddhant commited on
Commit
e887f19
1 Parent(s): 4645bfe

Update model

Browse files
Files changed (19) hide show
  1. README.md +642 -0
  2. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/RESULTS.md +29 -0
  3. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/config.yaml +541 -0
  4. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/acc.png +0 -0
  5. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/backward_time.png +0 -0
  6. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/cer.png +0 -0
  7. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/cer_ctc.png +0 -0
  8. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/forward_time.png +0 -0
  9. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/iter_time.png +0 -0
  11. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss.png +0 -0
  12. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss_att.png +0 -0
  13. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss_ctc.png +0 -0
  14. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/optim0_lr0.png +0 -0
  15. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/optim_step_time.png +0 -0
  16. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/train_time.png +0 -0
  17. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/wer.png +0 -0
  18. exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/valid.acc.ave_5best.pth +3 -0
  19. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,642 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - fsc_challenge
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/fsc_challenge_slu_2pass_transformer_gt`
15
+
16
+ This model was trained by Siddhant using fsc_challenge recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 3b54bfe52a294cdfce668c20d777bfa65f413745
26
+ pip install -e .
27
+ cd egs2/fsc_challenge/slu1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/fsc_challenge_slu_2pass_transformer_gt
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Sun Mar 13 20:59:06 EDT 2022`
35
+ - python version: `3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0]`
36
+ - espnet version: `espnet 0.10.3a3`
37
+ - pytorch version: `pytorch 1.9.0+cu102`
38
+ - Git hash: `97b9dad4dbca71702cb7928a126ec45d96414a3f`
39
+ - Commit date: `Mon Sep 13 22:55:04 2021 +0900`
40
+
41
+ ## asr_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3_raw_en_word
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|17937|99.9|0.1|0.0|0.0|0.1|0.6|
47
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|22540|89.8|6.6|3.6|0.0|10.2|27.6|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|152191|100.0|0.0|0.0|0.0|0.1|0.6|
54
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|191435|94.5|2.8|2.7|0.5|6.0|27.6|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+
61
+ ## ASR config
62
+
63
+ <details><summary>expand</summary>
64
+
65
+ ```
66
+ config: conf/tuning/train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3.yaml
67
+ print_config: false
68
+ log_level: INFO
69
+ dry_run: false
70
+ iterator_type: sequence
71
+ output_dir: exp/asr_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3_raw_en_word
72
+ ngpu: 1
73
+ seed: 0
74
+ num_workers: 1
75
+ num_att_plot: 3
76
+ dist_backend: nccl
77
+ dist_init_method: env://
78
+ dist_world_size: null
79
+ dist_rank: null
80
+ local_rank: 0
81
+ dist_master_addr: null
82
+ dist_master_port: null
83
+ dist_launcher: null
84
+ multiprocessing_distributed: false
85
+ unused_parameters: false
86
+ sharded_ddp: false
87
+ cudnn_enabled: true
88
+ cudnn_benchmark: false
89
+ cudnn_deterministic: true
90
+ collect_stats: false
91
+ write_collected_feats: false
92
+ max_epoch: 25
93
+ patience: null
94
+ val_scheduler_criterion:
95
+ - valid
96
+ - loss
97
+ early_stopping_criterion:
98
+ - valid
99
+ - loss
100
+ - min
101
+ best_model_criterion:
102
+ - - train
103
+ - loss
104
+ - min
105
+ - - valid
106
+ - loss
107
+ - min
108
+ - - train
109
+ - acc
110
+ - max
111
+ - - valid
112
+ - acc
113
+ - max
114
+ keep_nbest_models: 5
115
+ grad_clip: 5.0
116
+ grad_clip_type: 2.0
117
+ grad_noise: false
118
+ accum_grad: 1
119
+ no_forward_run: false
120
+ resume: true
121
+ train_dtype: float32
122
+ use_amp: false
123
+ log_interval: null
124
+ use_tensorboard: true
125
+ use_wandb: false
126
+ wandb_project: null
127
+ wandb_id: null
128
+ wandb_entity: null
129
+ wandb_name: null
130
+ wandb_model_log_interval: -1
131
+ detect_anomaly: false
132
+ pretrain_path: null
133
+ init_param:
134
+ - ../../fsc_challenge/asr1/exp/asr_train_asr_hubert_transformer_adam_specaug_old_raw_en_word/valid.acc.ave_5best.pth:encoder:encoder
135
+ ignore_init_mismatch: false
136
+ freeze_param:
137
+ - encoder
138
+ - postdecoder.model
139
+ - frontend.upstream
140
+ num_iters_per_epoch: null
141
+ batch_size: 20
142
+ valid_batch_size: null
143
+ batch_bins: 1000000
144
+ valid_batch_bins: null
145
+ train_shape_file:
146
+ - exp/asr_stats_raw_en_word/train/speech_shape
147
+ - exp/asr_stats_raw_en_word/train/text_shape.word
148
+ - exp/asr_stats_raw_en_word/train/transcript_shape.word
149
+ valid_shape_file:
150
+ - exp/asr_stats_raw_en_word/valid/speech_shape
151
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
152
+ - exp/asr_stats_raw_en_word/valid/transcript_shape.word
153
+ batch_type: folded
154
+ valid_batch_type: null
155
+ fold_length:
156
+ - 80000
157
+ - 150
158
+ - 150
159
+ sort_in_batch: descending
160
+ sort_batch: descending
161
+ multiple_iterator: false
162
+ chunk_length: 500
163
+ chunk_shift_ratio: 0.5
164
+ num_cache_chunks: 1024
165
+ train_data_path_and_name_and_type:
166
+ - - dump/raw/train/wav.scp
167
+ - speech
168
+ - sound
169
+ - - dump/raw/train/text
170
+ - text
171
+ - text
172
+ - - dump/raw/train/transcript
173
+ - transcript
174
+ - text
175
+ valid_data_path_and_name_and_type:
176
+ - - dump/raw/valid/wav.scp
177
+ - speech
178
+ - sound
179
+ - - dump/raw/valid/text
180
+ - text
181
+ - text
182
+ - - dump/raw/valid/transcript
183
+ - transcript
184
+ - text
185
+ allow_variable_data_keys: false
186
+ max_cache_size: 0.0
187
+ max_cache_fd: 32
188
+ valid_max_cache_size: null
189
+ optim: adam
190
+ optim_conf:
191
+ lr: 0.0002
192
+ scheduler: warmuplr
193
+ scheduler_conf:
194
+ warmup_steps: 25000
195
+ token_list:
196
+ - <blank>
197
+ - <unk>
198
+ - the
199
+ - turn
200
+ - lights
201
+ - in
202
+ - up
203
+ - 'on'
204
+ - down
205
+ - temperature
206
+ - heat
207
+ - switch
208
+ - kitchen
209
+ - volume
210
+ - 'off'
211
+ - increase_volume_none
212
+ - bedroom
213
+ - washroom
214
+ - decrease_volume_none
215
+ - language
216
+ - bathroom
217
+ - decrease
218
+ - my
219
+ - to
220
+ - increase
221
+ - decrease_heat_washroom
222
+ - increase_heat_washroom
223
+ - music
224
+ - heating
225
+ - bring
226
+ - increase_heat_none
227
+ - too
228
+ - decrease_heat_none
229
+ - me
230
+ - change_language_none_none
231
+ - activate_lights_washroom
232
+ - set
233
+ - activate_lights_kitchen
234
+ - activate_music_none
235
+ - lamp
236
+ - deactivate_music_none
237
+ - increase_heat_bedroom
238
+ - i
239
+ - increase_heat_kitchen
240
+ - sound
241
+ - get
242
+ - decrease_heat_kitchen
243
+ - loud
244
+ - activate_lights_bedroom
245
+ - deactivate_lights_bedroom
246
+ - decrease_heat_bedroom
247
+ - need
248
+ - deactivate_lights_kitchen
249
+ - bring_newspaper_none
250
+ - newspaper
251
+ - bring_shoes_none
252
+ - shoes
253
+ - bring_socks_none
254
+ - socks
255
+ - activate_lights_none
256
+ - deactivate_lights_none
257
+ - louder
258
+ - go
259
+ - deactivate_lights_washroom
260
+ - change_language_Chinese_none
261
+ - chinese
262
+ - could
263
+ - you
264
+ - bring_juice_none
265
+ - juice
266
+ - deactivate_lamp_none
267
+ - make
268
+ - activate_lamp_none
269
+ - it
270
+ - stop
271
+ - play
272
+ - change
273
+ - quiet
274
+ - change_language_Korean_none
275
+ - korean
276
+ - some
277
+ - practice
278
+ - change_language_German_none
279
+ - german
280
+ - ok
281
+ - now
282
+ - main
283
+ - change_language_English_none
284
+ - english
285
+ - its
286
+ - hear
287
+ - pause
288
+ - this
289
+ - thats
290
+ - lower
291
+ - far
292
+ - audio
293
+ - please
294
+ - fetch
295
+ - phones
296
+ - a
297
+ - different
298
+ - start
299
+ - resume
300
+ - softer
301
+ - couldnt
302
+ - anything
303
+ - quieter
304
+ - put
305
+ - video
306
+ - is
307
+ - low
308
+ - max
309
+ - phone
310
+ - mute
311
+ - reduce
312
+ - use
313
+ - languages
314
+ - allow
315
+ - device
316
+ - system
317
+ - <sos/eos>
318
+ transcript_token_list:
319
+ - <blank>
320
+ - <unk>
321
+ - the
322
+ - turn
323
+ - lights
324
+ - in
325
+ - up
326
+ - 'on'
327
+ - down
328
+ - temperature
329
+ - heat
330
+ - switch
331
+ - kitchen
332
+ - volume
333
+ - 'off'
334
+ - bedroom
335
+ - washroom
336
+ - language
337
+ - bathroom
338
+ - decrease
339
+ - my
340
+ - to
341
+ - increase
342
+ - music
343
+ - heating
344
+ - bring
345
+ - too
346
+ - me
347
+ - set
348
+ - lamp
349
+ - i
350
+ - sound
351
+ - get
352
+ - loud
353
+ - need
354
+ - newspaper
355
+ - shoes
356
+ - socks
357
+ - louder
358
+ - go
359
+ - chinese
360
+ - could
361
+ - you
362
+ - juice
363
+ - make
364
+ - it
365
+ - stop
366
+ - play
367
+ - change
368
+ - quiet
369
+ - korean
370
+ - some
371
+ - practice
372
+ - german
373
+ - ok
374
+ - now
375
+ - main
376
+ - english
377
+ - its
378
+ - hear
379
+ - pause
380
+ - this
381
+ - thats
382
+ - lower
383
+ - far
384
+ - audio
385
+ - please
386
+ - fetch
387
+ - phones
388
+ - a
389
+ - different
390
+ - start
391
+ - resume
392
+ - softer
393
+ - couldnt
394
+ - anything
395
+ - quieter
396
+ - put
397
+ - video
398
+ - is
399
+ - low
400
+ - max
401
+ - phone
402
+ - mute
403
+ - reduce
404
+ - use
405
+ - languages
406
+ - allow
407
+ - device
408
+ - system
409
+ - <sos/eos>
410
+ two_pass: false
411
+ pre_postencoder_norm: false
412
+ init: null
413
+ input_size: null
414
+ ctc_conf:
415
+ dropout_rate: 0.0
416
+ ctc_type: builtin
417
+ reduce: true
418
+ ignore_nan_grad: true
419
+ model_conf:
420
+ transcript_token_list:
421
+ - <blank>
422
+ - <unk>
423
+ - the
424
+ - turn
425
+ - lights
426
+ - in
427
+ - up
428
+ - 'on'
429
+ - down
430
+ - temperature
431
+ - heat
432
+ - switch
433
+ - kitchen
434
+ - volume
435
+ - 'off'
436
+ - bedroom
437
+ - washroom
438
+ - language
439
+ - bathroom
440
+ - decrease
441
+ - my
442
+ - to
443
+ - increase
444
+ - music
445
+ - heating
446
+ - bring
447
+ - too
448
+ - me
449
+ - set
450
+ - lamp
451
+ - i
452
+ - sound
453
+ - get
454
+ - loud
455
+ - need
456
+ - newspaper
457
+ - shoes
458
+ - socks
459
+ - louder
460
+ - go
461
+ - chinese
462
+ - could
463
+ - you
464
+ - juice
465
+ - make
466
+ - it
467
+ - stop
468
+ - play
469
+ - change
470
+ - quiet
471
+ - korean
472
+ - some
473
+ - practice
474
+ - german
475
+ - ok
476
+ - now
477
+ - main
478
+ - english
479
+ - its
480
+ - hear
481
+ - pause
482
+ - this
483
+ - thats
484
+ - lower
485
+ - far
486
+ - audio
487
+ - please
488
+ - fetch
489
+ - phones
490
+ - a
491
+ - different
492
+ - start
493
+ - resume
494
+ - softer
495
+ - couldnt
496
+ - anything
497
+ - quieter
498
+ - put
499
+ - video
500
+ - is
501
+ - low
502
+ - max
503
+ - phone
504
+ - mute
505
+ - reduce
506
+ - use
507
+ - languages
508
+ - allow
509
+ - device
510
+ - system
511
+ - <sos/eos>
512
+ ctc_weight: 0.5
513
+ ignore_id: -1
514
+ lsm_weight: 0.0
515
+ length_normalized_loss: false
516
+ report_cer: true
517
+ report_wer: true
518
+ sym_space: <space>
519
+ sym_blank: <blank>
520
+ extract_feats_in_collect_stats: true
521
+ two_pass: false
522
+ pre_postencoder_norm: false
523
+ use_preprocessor: true
524
+ token_type: word
525
+ bpemodel: null
526
+ non_linguistic_symbols: null
527
+ cleaner: null
528
+ g2p: null
529
+ speech_volume_normalize: null
530
+ rir_scp: null
531
+ rir_apply_prob: 1.0
532
+ noise_scp: null
533
+ noise_apply_prob: 1.0
534
+ noise_db_range: '13_15'
535
+ frontend: s3prl
536
+ frontend_conf:
537
+ frontend_conf:
538
+ upstream: hubert_large_ll60k
539
+ download_dir: ./hub
540
+ multilayer_feature: true
541
+ fs: 16k
542
+ specaug: specaug
543
+ specaug_conf:
544
+ apply_time_warp: true
545
+ time_warp_window: 5
546
+ time_warp_mode: bicubic
547
+ apply_freq_mask: true
548
+ freq_mask_width_range:
549
+ - 0
550
+ - 30
551
+ num_freq_mask: 2
552
+ apply_time_mask: true
553
+ time_mask_width_range:
554
+ - 0
555
+ - 40
556
+ num_time_mask: 2
557
+ normalize: utterance_mvn
558
+ normalize_conf: {}
559
+ preencoder: linear
560
+ preencoder_conf:
561
+ input_size: 1024
562
+ output_size: 80
563
+ encoder: transformer
564
+ encoder_conf:
565
+ output_size: 256
566
+ attention_heads: 4
567
+ linear_units: 2048
568
+ num_blocks: 12
569
+ dropout_rate: 0.1
570
+ positional_dropout_rate: 0.1
571
+ attention_dropout_rate: 0.0
572
+ input_layer: conv2d
573
+ normalize_before: true
574
+ postencoder: null
575
+ postencoder_conf: {}
576
+ deliberationencoder: transformer
577
+ deliberationencoder_conf:
578
+ output_size: 256
579
+ attention_heads: 4
580
+ linear_units: 2048
581
+ num_blocks: 4
582
+ dropout_rate: 0.1
583
+ positional_dropout_rate: 0.1
584
+ attention_dropout_rate: 0.0
585
+ input_layer: linear
586
+ normalize_before: true
587
+ decoder: transformer
588
+ decoder_conf:
589
+ attention_heads: 4
590
+ linear_units: 2048
591
+ num_blocks: 6
592
+ dropout_rate: 0.1
593
+ positional_dropout_rate: 0.1
594
+ self_attention_dropout_rate: 0.0
595
+ src_attention_dropout_rate: 0.0
596
+ decoder2: rnn
597
+ decoder2_conf: {}
598
+ postdecoder: hugging_face_transformers
599
+ postdecoder_conf:
600
+ model_name_or_path: bert-base-cased
601
+ output_size: 256
602
+ required:
603
+ - output_dir
604
+ - token_list
605
+ version: 0.10.3a3
606
+ distributed: false
607
+ ```
608
+
609
+ </details>
610
+
611
+
612
+
613
+ ### Citing ESPnet
614
+
615
+ ```BibTex
616
+ @inproceedings{watanabe2018espnet,
617
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
618
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
619
+ year={2018},
620
+ booktitle={Proceedings of Interspeech},
621
+ pages={2207--2211},
622
+ doi={10.21437/Interspeech.2018-1456},
623
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
624
+ }
625
+
626
+
627
+
628
+
629
+ ```
630
+
631
+ or arXiv:
632
+
633
+ ```bibtex
634
+ @misc{watanabe2018espnet,
635
+ title={ESPnet: End-to-End Speech Processing Toolkit},
636
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
637
+ year={2018},
638
+ eprint={1804.00015},
639
+ archivePrefix={arXiv},
640
+ primaryClass={cs.CL}
641
+ }
642
+ ```
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Mar 13 20:59:06 EDT 2022`
5
+ - python version: `3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.3a3`
7
+ - pytorch version: `pytorch 1.9.0+cu102`
8
+ - Git hash: `97b9dad4dbca71702cb7928a126ec45d96414a3f`
9
+ - Commit date: `Mon Sep 13 22:55:04 2021 +0900`
10
+
11
+ ## asr_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3_raw_en_word
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|17937|99.9|0.1|0.0|0.0|0.1|0.6|
17
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|22540|89.8|6.6|3.6|0.0|10.2|27.6|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|152191|100.0|0.0|0.0|0.0|0.1|0.6|
24
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|191435|94.5|2.8|2.7|0.5|6.0|27.6|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/config.yaml ADDED
@@ -0,0 +1,541 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_3_raw_en_word
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 25
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models: 5
50
+ grad_clip: 5.0
51
+ grad_clip_type: 2.0
52
+ grad_noise: false
53
+ accum_grad: 1
54
+ no_forward_run: false
55
+ resume: true
56
+ train_dtype: float32
57
+ use_amp: false
58
+ log_interval: null
59
+ use_tensorboard: true
60
+ use_wandb: false
61
+ wandb_project: null
62
+ wandb_id: null
63
+ wandb_entity: null
64
+ wandb_name: null
65
+ wandb_model_log_interval: -1
66
+ detect_anomaly: false
67
+ pretrain_path: null
68
+ init_param:
69
+ - ../../fsc_challenge/asr1/exp/asr_train_asr_hubert_transformer_adam_specaug_old_raw_en_word/valid.acc.ave_5best.pth:encoder:encoder
70
+ ignore_init_mismatch: false
71
+ freeze_param:
72
+ - encoder
73
+ - postdecoder.model
74
+ - frontend.upstream
75
+ num_iters_per_epoch: null
76
+ batch_size: 20
77
+ valid_batch_size: null
78
+ batch_bins: 1000000
79
+ valid_batch_bins: null
80
+ train_shape_file:
81
+ - exp/asr_stats_raw_en_word/train/speech_shape
82
+ - exp/asr_stats_raw_en_word/train/text_shape.word
83
+ - exp/asr_stats_raw_en_word/train/transcript_shape.word
84
+ valid_shape_file:
85
+ - exp/asr_stats_raw_en_word/valid/speech_shape
86
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
87
+ - exp/asr_stats_raw_en_word/valid/transcript_shape.word
88
+ batch_type: folded
89
+ valid_batch_type: null
90
+ fold_length:
91
+ - 80000
92
+ - 150
93
+ - 150
94
+ sort_in_batch: descending
95
+ sort_batch: descending
96
+ multiple_iterator: false
97
+ chunk_length: 500
98
+ chunk_shift_ratio: 0.5
99
+ num_cache_chunks: 1024
100
+ train_data_path_and_name_and_type:
101
+ - - dump/raw/train/wav.scp
102
+ - speech
103
+ - sound
104
+ - - dump/raw/train/text
105
+ - text
106
+ - text
107
+ - - dump/raw/train/transcript
108
+ - transcript
109
+ - text
110
+ valid_data_path_and_name_and_type:
111
+ - - dump/raw/valid/wav.scp
112
+ - speech
113
+ - sound
114
+ - - dump/raw/valid/text
115
+ - text
116
+ - text
117
+ - - dump/raw/valid/transcript
118
+ - transcript
119
+ - text
120
+ allow_variable_data_keys: false
121
+ max_cache_size: 0.0
122
+ max_cache_fd: 32
123
+ valid_max_cache_size: null
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.0002
127
+ scheduler: warmuplr
128
+ scheduler_conf:
129
+ warmup_steps: 25000
130
+ token_list:
131
+ - <blank>
132
+ - <unk>
133
+ - the
134
+ - turn
135
+ - lights
136
+ - in
137
+ - up
138
+ - 'on'
139
+ - down
140
+ - temperature
141
+ - heat
142
+ - switch
143
+ - kitchen
144
+ - volume
145
+ - 'off'
146
+ - increase_volume_none
147
+ - bedroom
148
+ - washroom
149
+ - decrease_volume_none
150
+ - language
151
+ - bathroom
152
+ - decrease
153
+ - my
154
+ - to
155
+ - increase
156
+ - decrease_heat_washroom
157
+ - increase_heat_washroom
158
+ - music
159
+ - heating
160
+ - bring
161
+ - increase_heat_none
162
+ - too
163
+ - decrease_heat_none
164
+ - me
165
+ - change_language_none_none
166
+ - activate_lights_washroom
167
+ - set
168
+ - activate_lights_kitchen
169
+ - activate_music_none
170
+ - lamp
171
+ - deactivate_music_none
172
+ - increase_heat_bedroom
173
+ - i
174
+ - increase_heat_kitchen
175
+ - sound
176
+ - get
177
+ - decrease_heat_kitchen
178
+ - loud
179
+ - activate_lights_bedroom
180
+ - deactivate_lights_bedroom
181
+ - decrease_heat_bedroom
182
+ - need
183
+ - deactivate_lights_kitchen
184
+ - bring_newspaper_none
185
+ - newspaper
186
+ - bring_shoes_none
187
+ - shoes
188
+ - bring_socks_none
189
+ - socks
190
+ - activate_lights_none
191
+ - deactivate_lights_none
192
+ - louder
193
+ - go
194
+ - deactivate_lights_washroom
195
+ - change_language_Chinese_none
196
+ - chinese
197
+ - could
198
+ - you
199
+ - bring_juice_none
200
+ - juice
201
+ - deactivate_lamp_none
202
+ - make
203
+ - activate_lamp_none
204
+ - it
205
+ - stop
206
+ - play
207
+ - change
208
+ - quiet
209
+ - change_language_Korean_none
210
+ - korean
211
+ - some
212
+ - practice
213
+ - change_language_German_none
214
+ - german
215
+ - ok
216
+ - now
217
+ - main
218
+ - change_language_English_none
219
+ - english
220
+ - its
221
+ - hear
222
+ - pause
223
+ - this
224
+ - thats
225
+ - lower
226
+ - far
227
+ - audio
228
+ - please
229
+ - fetch
230
+ - phones
231
+ - a
232
+ - different
233
+ - start
234
+ - resume
235
+ - softer
236
+ - couldnt
237
+ - anything
238
+ - quieter
239
+ - put
240
+ - video
241
+ - is
242
+ - low
243
+ - max
244
+ - phone
245
+ - mute
246
+ - reduce
247
+ - use
248
+ - languages
249
+ - allow
250
+ - device
251
+ - system
252
+ - <sos/eos>
253
+ transcript_token_list:
254
+ - <blank>
255
+ - <unk>
256
+ - the
257
+ - turn
258
+ - lights
259
+ - in
260
+ - up
261
+ - 'on'
262
+ - down
263
+ - temperature
264
+ - heat
265
+ - switch
266
+ - kitchen
267
+ - volume
268
+ - 'off'
269
+ - bedroom
270
+ - washroom
271
+ - language
272
+ - bathroom
273
+ - decrease
274
+ - my
275
+ - to
276
+ - increase
277
+ - music
278
+ - heating
279
+ - bring
280
+ - too
281
+ - me
282
+ - set
283
+ - lamp
284
+ - i
285
+ - sound
286
+ - get
287
+ - loud
288
+ - need
289
+ - newspaper
290
+ - shoes
291
+ - socks
292
+ - louder
293
+ - go
294
+ - chinese
295
+ - could
296
+ - you
297
+ - juice
298
+ - make
299
+ - it
300
+ - stop
301
+ - play
302
+ - change
303
+ - quiet
304
+ - korean
305
+ - some
306
+ - practice
307
+ - german
308
+ - ok
309
+ - now
310
+ - main
311
+ - english
312
+ - its
313
+ - hear
314
+ - pause
315
+ - this
316
+ - thats
317
+ - lower
318
+ - far
319
+ - audio
320
+ - please
321
+ - fetch
322
+ - phones
323
+ - a
324
+ - different
325
+ - start
326
+ - resume
327
+ - softer
328
+ - couldnt
329
+ - anything
330
+ - quieter
331
+ - put
332
+ - video
333
+ - is
334
+ - low
335
+ - max
336
+ - phone
337
+ - mute
338
+ - reduce
339
+ - use
340
+ - languages
341
+ - allow
342
+ - device
343
+ - system
344
+ - <sos/eos>
345
+ two_pass: false
346
+ pre_postencoder_norm: false
347
+ init: null
348
+ input_size: null
349
+ ctc_conf:
350
+ dropout_rate: 0.0
351
+ ctc_type: builtin
352
+ reduce: true
353
+ ignore_nan_grad: true
354
+ model_conf:
355
+ transcript_token_list:
356
+ - <blank>
357
+ - <unk>
358
+ - the
359
+ - turn
360
+ - lights
361
+ - in
362
+ - up
363
+ - 'on'
364
+ - down
365
+ - temperature
366
+ - heat
367
+ - switch
368
+ - kitchen
369
+ - volume
370
+ - 'off'
371
+ - bedroom
372
+ - washroom
373
+ - language
374
+ - bathroom
375
+ - decrease
376
+ - my
377
+ - to
378
+ - increase
379
+ - music
380
+ - heating
381
+ - bring
382
+ - too
383
+ - me
384
+ - set
385
+ - lamp
386
+ - i
387
+ - sound
388
+ - get
389
+ - loud
390
+ - need
391
+ - newspaper
392
+ - shoes
393
+ - socks
394
+ - louder
395
+ - go
396
+ - chinese
397
+ - could
398
+ - you
399
+ - juice
400
+ - make
401
+ - it
402
+ - stop
403
+ - play
404
+ - change
405
+ - quiet
406
+ - korean
407
+ - some
408
+ - practice
409
+ - german
410
+ - ok
411
+ - now
412
+ - main
413
+ - english
414
+ - its
415
+ - hear
416
+ - pause
417
+ - this
418
+ - thats
419
+ - lower
420
+ - far
421
+ - audio
422
+ - please
423
+ - fetch
424
+ - phones
425
+ - a
426
+ - different
427
+ - start
428
+ - resume
429
+ - softer
430
+ - couldnt
431
+ - anything
432
+ - quieter
433
+ - put
434
+ - video
435
+ - is
436
+ - low
437
+ - max
438
+ - phone
439
+ - mute
440
+ - reduce
441
+ - use
442
+ - languages
443
+ - allow
444
+ - device
445
+ - system
446
+ - <sos/eos>
447
+ ctc_weight: 0.5
448
+ ignore_id: -1
449
+ lsm_weight: 0.0
450
+ length_normalized_loss: false
451
+ report_cer: true
452
+ report_wer: true
453
+ sym_space: <space>
454
+ sym_blank: <blank>
455
+ extract_feats_in_collect_stats: true
456
+ two_pass: false
457
+ pre_postencoder_norm: false
458
+ use_preprocessor: true
459
+ token_type: word
460
+ bpemodel: null
461
+ non_linguistic_symbols: null
462
+ cleaner: null
463
+ g2p: null
464
+ speech_volume_normalize: null
465
+ rir_scp: null
466
+ rir_apply_prob: 1.0
467
+ noise_scp: null
468
+ noise_apply_prob: 1.0
469
+ noise_db_range: '13_15'
470
+ frontend: s3prl
471
+ frontend_conf:
472
+ frontend_conf:
473
+ upstream: hubert_large_ll60k
474
+ download_dir: ./hub
475
+ multilayer_feature: true
476
+ fs: 16k
477
+ specaug: specaug
478
+ specaug_conf:
479
+ apply_time_warp: true
480
+ time_warp_window: 5
481
+ time_warp_mode: bicubic
482
+ apply_freq_mask: true
483
+ freq_mask_width_range:
484
+ - 0
485
+ - 30
486
+ num_freq_mask: 2
487
+ apply_time_mask: true
488
+ time_mask_width_range:
489
+ - 0
490
+ - 40
491
+ num_time_mask: 2
492
+ normalize: utterance_mvn
493
+ normalize_conf: {}
494
+ preencoder: linear
495
+ preencoder_conf:
496
+ input_size: 1024
497
+ output_size: 80
498
+ encoder: transformer
499
+ encoder_conf:
500
+ output_size: 256
501
+ attention_heads: 4
502
+ linear_units: 2048
503
+ num_blocks: 12
504
+ dropout_rate: 0.1
505
+ positional_dropout_rate: 0.1
506
+ attention_dropout_rate: 0.0
507
+ input_layer: conv2d
508
+ normalize_before: true
509
+ postencoder: null
510
+ postencoder_conf: {}
511
+ deliberationencoder: transformer
512
+ deliberationencoder_conf:
513
+ output_size: 256
514
+ attention_heads: 4
515
+ linear_units: 2048
516
+ num_blocks: 4
517
+ dropout_rate: 0.1
518
+ positional_dropout_rate: 0.1
519
+ attention_dropout_rate: 0.0
520
+ input_layer: linear
521
+ normalize_before: true
522
+ decoder: transformer
523
+ decoder_conf:
524
+ attention_heads: 4
525
+ linear_units: 2048
526
+ num_blocks: 6
527
+ dropout_rate: 0.1
528
+ positional_dropout_rate: 0.1
529
+ self_attention_dropout_rate: 0.0
530
+ src_attention_dropout_rate: 0.0
531
+ decoder2: rnn
532
+ decoder2_conf: {}
533
+ postdecoder: hugging_face_transformers
534
+ postdecoder_conf:
535
+ model_name_or_path: bert-base-cased
536
+ output_size: 256
537
+ required:
538
+ - output_dir
539
+ - token_list
540
+ version: 0.10.3a3
541
+ distributed: false
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/acc.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/backward_time.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/cer.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/cer_ctc.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/forward_time.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/gpu_max_cached_mem_GB.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/iter_time.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss_att.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/loss_ctc.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/optim0_lr0.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/optim_step_time.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/train_time.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/images/wer.png ADDED
exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/valid.acc.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15c1f3a4781fee2b987e4dd831b82e85c8b702959ddf4df960761214f8c1b1c3
3
+ size 1831278649
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202207'
2
+ files:
3
+ slu_model_file: exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/valid.acc.ave_5best.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1663105666.18087
6
+ torch: 1.12.1+cu113
7
+ yaml_files:
8
+ slu_train_config: exp/slu_train_asr_hubert_transformer_adam_specaug_deliberation_transformer_gt_raw_en_word/config.yaml