Darshan Prabhu commited on
Commit
cb6f14c
1 Parent(s): cef3855

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,948 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - slurp_entity
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `Darshan7575/slurp_multiconvformer_conv_fusion`
15
+
16
+ This model was trained by Darshan using slurp_entity recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ a50d6a0c8c31b4ef775473a657de031a40be30c1
26
+ pip install -e .
27
+ cd egs2_imp/slurp_entity/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model Darshan7575/slurp_multiconvformer_conv_fusion
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Wed Feb 21 01:04:03 EST 2024`
35
+ - python version: `3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202310`
37
+ - pytorch version: `pytorch 2.1.2+cu118`
38
+ - Git hash: `edb6ec64bb5d4f2c68a3b81674f0c2822e2e5b58`
39
+ - Commit date: `Fri Feb 9 21:26:35 2024 +0530`
40
+
41
+ ## exp/slurp_multiconvformer_conv_fusion
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|84.1|7.5|8.5|2.7|18.7|47.0|
47
+
48
+ ### CER
49
+
50
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
51
+ |---|---|---|---|---|---|---|---|---|
52
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|90.0|3.0|7.0|3.1|13.1|47.0|
53
+
54
+ ### TER
55
+
56
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
57
+ |---|---|---|---|---|---|---|---|---|
58
+ ## exp/slurp_multiconvformer_conv_fusion/decode_asr_asr_model_valid.acc.ave_10best
59
+ ### WER
60
+
61
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
62
+ |---|---|---|---|---|---|---|---|---|
63
+ |org/devel|8690|178058|84.9|7.4|7.7|3.1|18.1|48.7|
64
+
65
+ ### CER
66
+
67
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
68
+ |---|---|---|---|---|---|---|---|---|
69
+ |org/devel|8690|847400|90.9|2.9|6.2|3.4|12.5|48.7|
70
+
71
+ ### TER
72
+
73
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
74
+ |---|---|---|---|---|---|---|---|---|
75
+
76
+ ## ASR config
77
+
78
+ <details><summary>expand</summary>
79
+
80
+ ```
81
+ config: conf/tuning/train_asr_multiconv_e12_mlp3072_linear2048_layerdrop.yaml
82
+ print_config: false
83
+ log_level: INFO
84
+ drop_last_iter: false
85
+ dry_run: false
86
+ iterator_type: sequence
87
+ valid_iterator_type: null
88
+ output_dir: exp/slurp_multiconvformer_conv_fusion
89
+ ngpu: 1
90
+ seed: 0
91
+ num_workers: 1
92
+ num_att_plot: 3
93
+ dist_backend: nccl
94
+ dist_init_method: env://
95
+ dist_world_size: 2
96
+ dist_rank: 0
97
+ local_rank: 0
98
+ dist_master_addr: localhost
99
+ dist_master_port: 40439
100
+ dist_launcher: null
101
+ multiprocessing_distributed: true
102
+ unused_parameters: true
103
+ sharded_ddp: false
104
+ cudnn_enabled: true
105
+ cudnn_benchmark: false
106
+ cudnn_deterministic: true
107
+ collect_stats: false
108
+ write_collected_feats: false
109
+ max_epoch: 60
110
+ patience: null
111
+ val_scheduler_criterion:
112
+ - valid
113
+ - loss
114
+ early_stopping_criterion:
115
+ - valid
116
+ - loss
117
+ - min
118
+ best_model_criterion:
119
+ - - valid
120
+ - acc
121
+ - max
122
+ keep_nbest_models: 10
123
+ nbest_averaging_interval: 0
124
+ grad_clip: 5.0
125
+ grad_clip_type: 2.0
126
+ grad_noise: false
127
+ accum_grad: 1
128
+ no_forward_run: false
129
+ resume: true
130
+ train_dtype: float32
131
+ use_amp: false
132
+ log_interval: null
133
+ use_matplotlib: true
134
+ use_tensorboard: true
135
+ create_graph_in_tensorboard: false
136
+ use_wandb: false
137
+ wandb_project: null
138
+ wandb_id: null
139
+ wandb_entity: null
140
+ wandb_name: null
141
+ wandb_model_log_interval: -1
142
+ detect_anomaly: false
143
+ use_lora: false
144
+ save_lora_only: true
145
+ lora_conf: {}
146
+ pretrain_path: null
147
+ init_param: []
148
+ ignore_init_mismatch: false
149
+ freeze_param: []
150
+ num_iters_per_epoch: null
151
+ batch_size: 64
152
+ valid_batch_size: null
153
+ batch_bins: 1000000
154
+ valid_batch_bins: null
155
+ train_shape_file:
156
+ - exp/asr_stats_raw_en_word/train/speech_shape
157
+ - exp/asr_stats_raw_en_word/train/text_shape.word
158
+ valid_shape_file:
159
+ - exp/asr_stats_raw_en_word/valid/speech_shape
160
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
161
+ batch_type: folded
162
+ valid_batch_type: null
163
+ fold_length:
164
+ - 80000
165
+ - 150
166
+ sort_in_batch: descending
167
+ shuffle_within_batch: false
168
+ sort_batch: descending
169
+ multiple_iterator: false
170
+ chunk_length: 500
171
+ chunk_shift_ratio: 0.5
172
+ num_cache_chunks: 1024
173
+ chunk_excluded_key_prefixes: []
174
+ chunk_default_fs: null
175
+ train_data_path_and_name_and_type:
176
+ - - dump/raw/train/wav.scp
177
+ - speech
178
+ - kaldi_ark
179
+ - - dump/raw/train/text
180
+ - text
181
+ - text
182
+ valid_data_path_and_name_and_type:
183
+ - - dump/raw/devel/wav.scp
184
+ - speech
185
+ - kaldi_ark
186
+ - - dump/raw/devel/text
187
+ - text
188
+ - text
189
+ allow_variable_data_keys: false
190
+ max_cache_size: 0.0
191
+ max_cache_fd: 32
192
+ allow_multi_rates: false
193
+ valid_max_cache_size: null
194
+ exclude_weight_decay: false
195
+ exclude_weight_decay_conf: {}
196
+ optim: adam
197
+ optim_conf:
198
+ lr: 0.001
199
+ weight_decay: 1.0e-06
200
+ scheduler: warmuplr
201
+ scheduler_conf:
202
+ warmup_steps: 35000
203
+ token_list:
204
+ - <blank>
205
+ - <unk>
206
+ - ▁SEP
207
+ - ▁FILL
208
+ - s
209
+ - ▁the
210
+ - a
211
+ - ▁to
212
+ - ▁i
213
+ - ▁me
214
+ - e
215
+ - ▁s
216
+ - ▁a
217
+ - i
218
+ - ▁you
219
+ - ▁what
220
+ - er
221
+ - ing
222
+ - u
223
+ - ▁is
224
+ - ''''
225
+ - o
226
+ - p
227
+ - ▁in
228
+ - ▁p
229
+ - y
230
+ - ▁my
231
+ - ▁please
232
+ - d
233
+ - c
234
+ - m
235
+ - ▁b
236
+ - l
237
+ - ▁m
238
+ - ▁c
239
+ - st
240
+ - date
241
+ - n
242
+ - ▁d
243
+ - le
244
+ - b
245
+ - ▁for
246
+ - re
247
+ - t
248
+ - ▁on
249
+ - en
250
+ - h
251
+ - 'on'
252
+ - ar
253
+ - person
254
+ - ▁re
255
+ - ▁f
256
+ - ▁g
257
+ - ▁of
258
+ - an
259
+ - ▁
260
+ - g
261
+ - ▁today
262
+ - ▁t
263
+ - or
264
+ - ▁it
265
+ - ▁this
266
+ - ▁h
267
+ - r
268
+ - f
269
+ - at
270
+ - ch
271
+ - ce
272
+ - place_name
273
+ - ▁email
274
+ - ▁do
275
+ - es
276
+ - ri
277
+ - ▁e
278
+ - ▁w
279
+ - ic
280
+ - in
281
+ - ▁that
282
+ - event_name
283
+ - ▁play
284
+ - ▁and
285
+ - al
286
+ - ▁n
287
+ - ▁can
288
+ - email_query
289
+ - ve
290
+ - ▁new
291
+ - day
292
+ - it
293
+ - ate
294
+ - ▁from
295
+ - ▁have
296
+ - k
297
+ - time
298
+ - ▁am
299
+ - media_type
300
+ - email_sendemail
301
+ - ent
302
+ - ▁olly
303
+ - qa_factoid
304
+ - se
305
+ - v
306
+ - et
307
+ - ck
308
+ - ▁any
309
+ - calendar_set
310
+ - ly
311
+ - th
312
+ - ▁how
313
+ - ▁meeting
314
+ - ed
315
+ - ▁tell
316
+ - ▁st
317
+ - x
318
+ - ur
319
+ - ro
320
+ - ▁at
321
+ - nd
322
+ - ▁list
323
+ - w
324
+ - ▁u
325
+ - ou
326
+ - ▁not
327
+ - ▁about
328
+ - ▁an
329
+ - ▁o
330
+ - general_negate
331
+ - ut
332
+ - ▁time
333
+ - ▁be
334
+ - ▁ch
335
+ - ▁are
336
+ - social_post
337
+ - business_name
338
+ - la
339
+ - ty
340
+ - play_music
341
+ - ot
342
+ - general_quirky
343
+ - ▁l
344
+ - ▁sh
345
+ - ▁tweet
346
+ - om
347
+ - ▁week
348
+ - um
349
+ - ▁one
350
+ - ter
351
+ - ▁he
352
+ - ▁up
353
+ - ▁com
354
+ - general_praise
355
+ - weather_query
356
+ - ▁next
357
+ - ▁th
358
+ - ▁check
359
+ - calendar_query
360
+ - ▁last
361
+ - ▁ro
362
+ - ad
363
+ - is
364
+ - ▁with
365
+ - ay
366
+ - ▁send
367
+ - pe
368
+ - ▁pm
369
+ - ▁tomorrow
370
+ - ▁j
371
+ - un
372
+ - ▁train
373
+ - general_explain
374
+ - ▁v
375
+ - one
376
+ - ▁r
377
+ - ra
378
+ - news_query
379
+ - ation
380
+ - ▁emails
381
+ - us
382
+ - if
383
+ - ct
384
+ - ▁co
385
+ - ▁add
386
+ - ▁will
387
+ - ▁se
388
+ - nt
389
+ - ▁was
390
+ - ine
391
+ - ▁de
392
+ - ▁set
393
+ - ▁ex
394
+ - ▁would
395
+ - ir
396
+ - ow
397
+ - ber
398
+ - general_repeat
399
+ - ight
400
+ - ook
401
+ - ▁again
402
+ - ▁song
403
+ - currency_name
404
+ - ll
405
+ - ▁ha
406
+ - ▁go
407
+ - relation
408
+ - te
409
+ - ion
410
+ - and
411
+ - ▁y
412
+ - ▁ye
413
+ - general_affirm
414
+ - general_confirm
415
+ - ery
416
+ - ▁po
417
+ - ff
418
+ - ▁we
419
+ - ▁turn
420
+ - ▁did
421
+ - ▁mar
422
+ - ▁alarm
423
+ - ▁like
424
+ - datetime_query
425
+ - ers
426
+ - ▁all
427
+ - ▁remind
428
+ - ▁so
429
+ - qa_definition
430
+ - ▁calendar
431
+ - end
432
+ - ▁said
433
+ - ci
434
+ - ▁off
435
+ - ▁john
436
+ - ▁day
437
+ - ss
438
+ - pla
439
+ - ume
440
+ - ▁get
441
+ - ail
442
+ - pp
443
+ - z
444
+ - ry
445
+ - am
446
+ - ▁need
447
+ - as
448
+ - ▁thank
449
+ - ▁wh
450
+ - ▁want
451
+ - ▁right
452
+ - ▁jo
453
+ - ▁facebook
454
+ - ▁k
455
+ - ge
456
+ - ld
457
+ - ▁fri
458
+ - ▁two
459
+ - general_dontcare
460
+ - ▁news
461
+ - ol
462
+ - oo
463
+ - ant
464
+ - ▁five
465
+ - ▁event
466
+ - ake
467
+ - definition_word
468
+ - transport_type
469
+ - ▁your
470
+ - vi
471
+ - orn
472
+ - op
473
+ - ▁weather
474
+ - ome
475
+ - ▁app
476
+ - ▁lo
477
+ - de
478
+ - ▁music
479
+ - weather_descriptor
480
+ - ak
481
+ - ke
482
+ - ▁there
483
+ - ▁si
484
+ - ▁lights
485
+ - ▁now
486
+ - ▁mo
487
+ - calendar_remove
488
+ - our
489
+ - ▁dollar
490
+ - food_type
491
+ - me
492
+ - ▁more
493
+ - ▁no
494
+ - ▁birthday
495
+ - orrect
496
+ - ▁rep
497
+ - ▁show
498
+ - play_radio
499
+ - ▁mon
500
+ - ▁does
501
+ - ood
502
+ - ag
503
+ - li
504
+ - ▁sto
505
+ - ▁contact
506
+ - cket
507
+ - email_querycontact
508
+ - ▁ev
509
+ - ▁could
510
+ - ange
511
+ - ▁just
512
+ - out
513
+ - ame
514
+ - .
515
+ - ▁ja
516
+ - ▁confirm
517
+ - qa_currency
518
+ - ▁man
519
+ - ▁late
520
+ - ▁think
521
+ - ▁some
522
+ - timeofday
523
+ - ▁bo
524
+ - qa_stock
525
+ - ong
526
+ - ▁start
527
+ - ▁work
528
+ - ▁ten
529
+ - int
530
+ - ▁command
531
+ - all
532
+ - ▁make
533
+ - ▁la
534
+ - j
535
+ - ▁answ
536
+ - ▁hour
537
+ - ▁cle
538
+ - ah
539
+ - ▁find
540
+ - ▁service
541
+ - ▁fa
542
+ - qu
543
+ - general_commandstop
544
+ - ai
545
+ - ▁when
546
+ - ▁te
547
+ - ▁by
548
+ - social_query
549
+ - ard
550
+ - ▁tw
551
+ - ul
552
+ - id
553
+ - ▁seven
554
+ - ▁where
555
+ - ▁much
556
+ - art
557
+ - ▁appointment
558
+ - ver
559
+ - artist_name
560
+ - el
561
+ - device_type
562
+ - ▁know
563
+ - ▁three
564
+ - ▁events
565
+ - ▁tr
566
+ - ▁li
567
+ - ork
568
+ - red
569
+ - ect
570
+ - ▁let
571
+ - ▁respon
572
+ - ▁par
573
+ - zz
574
+ - ▁give
575
+ - ▁twenty
576
+ - ▁ti
577
+ - ▁curre
578
+ - play_podcasts
579
+ - ▁radio
580
+ - cooking_recipe
581
+ - transport_query
582
+ - ▁con
583
+ - gh
584
+ - ▁le
585
+ - lists_query
586
+ - ▁rem
587
+ - recommendation_events
588
+ - house_place
589
+ - alarm_set
590
+ - play_audiobook
591
+ - ist
592
+ - ase
593
+ - music_genre
594
+ - ive
595
+ - ast
596
+ - player_setting
597
+ - ort
598
+ - lly
599
+ - news_topic
600
+ - list_name
601
+ - ▁playlist
602
+ - ▁ne
603
+ - business_type
604
+ - personal_info
605
+ - ind
606
+ - ust
607
+ - di
608
+ - ress
609
+ - recommendation_locations
610
+ - lists_createoradd
611
+ - iot_hue_lightoff
612
+ - lists_remove
613
+ - ord
614
+ - ▁light
615
+ - ere
616
+ - alarm_query
617
+ - audio_volume_mute
618
+ - music_query
619
+ - ▁audio
620
+ - rain
621
+ - ▁date
622
+ - ▁order
623
+ - audio_volume_up
624
+ - ▁ar
625
+ - ▁podcast
626
+ - transport_ticket
627
+ - mail
628
+ - iot_hue_lightchange
629
+ - iot_coffee
630
+ - radio_name
631
+ - ill
632
+ - ▁ri
633
+ - '@'
634
+ - takeaway_query
635
+ - song_name
636
+ - takeaway_order
637
+ - ▁ra
638
+ - email_addcontact
639
+ - play_game
640
+ - book
641
+ - transport_traffic
642
+ - ▁house
643
+ - music_likeness
644
+ - her
645
+ - transport_taxi
646
+ - iot_hue_lightdim
647
+ - ment
648
+ - ght
649
+ - fo
650
+ - order_type
651
+ - color_type
652
+ - '1'
653
+ - ven
654
+ - ould
655
+ - general_joke
656
+ - ess
657
+ - ain
658
+ - qa_maths
659
+ - ▁place
660
+ - ▁twe
661
+ - cast
662
+ - iot_cleaning
663
+ - ▁che
664
+ - ▁cont
665
+ - ith
666
+ - audiobook_name
667
+ - email_address
668
+ - game_name
669
+ - ▁cal
670
+ - general_frequency
671
+ - ▁tom
672
+ - ▁food
673
+ - act
674
+ - iot_hue_lightup
675
+ - '2'
676
+ - alarm_remove
677
+ - podcast_descriptor
678
+ - ▁definition
679
+ - audio_volume_down
680
+ - ▁media
681
+ - email_folder
682
+ - dia
683
+ - meal_type
684
+ - ▁mus
685
+ - recommendation_movies
686
+ - ▁ad
687
+ - ree
688
+ - pt
689
+ - now
690
+ - playlist_name
691
+ - ▁person
692
+ - change_amount
693
+ - ▁pla
694
+ - escri
695
+ - datetime_convert
696
+ - podcast_name
697
+ - ▁ab
698
+ - time_zone
699
+ - ▁def
700
+ - ting
701
+ - iot_wemo_on
702
+ - music_settings
703
+ - iot_wemo_off
704
+ - orre
705
+ - cy
706
+ - ank
707
+ - music_descriptor
708
+ - lar
709
+ - app_name
710
+ - row
711
+ - joke_type
712
+ - xt
713
+ - of
714
+ - ition
715
+ - ▁meet
716
+ - ink
717
+ - ▁confir
718
+ - transport_agency
719
+ - general_greet
720
+ - ▁business
721
+ - ▁art
722
+ - ▁ag
723
+ - urn
724
+ - escript
725
+ - rom
726
+ - ▁rel
727
+ - ▁au
728
+ - ▁currency
729
+ - audio_volume_other
730
+ - iot_hue_lighton
731
+ - ▁artist
732
+ - '?'
733
+ - ▁bus
734
+ - cooking_type
735
+ - movie_name
736
+ - coffee_type
737
+ - ingredient
738
+ - ather
739
+ - music_dislikeness
740
+ - sp
741
+ - q
742
+ - ▁ser
743
+ - esc
744
+ - ▁bir
745
+ - ▁cur
746
+ - name
747
+ - ▁tran
748
+ - ▁hou
749
+ - ek
750
+ - uch
751
+ - ▁conf
752
+ - ▁face
753
+ - '9'
754
+ - ▁birth
755
+ - I
756
+ - sw
757
+ - transport_descriptor
758
+ - ▁comm
759
+ - lease
760
+ - transport_name
761
+ - aid
762
+ - movie_type
763
+ - ▁device
764
+ - alarm_type
765
+ - audiobook_author
766
+ - '5'
767
+ - drink_type
768
+ - ▁joh
769
+ - ▁defin
770
+ - word
771
+ - ▁curren
772
+ - order
773
+ - iness
774
+ - W
775
+ - cooking_query
776
+ - sport_type
777
+ - ▁relation
778
+ - oint
779
+ - H
780
+ - '8'
781
+ - A
782
+ - '0'
783
+ - ▁dol
784
+ - vice
785
+ - ▁pers
786
+ - '&'
787
+ - T
788
+ - ▁appoint
789
+ - _
790
+ - '7'
791
+ - '3'
792
+ - '-'
793
+ - game_type
794
+ - ▁pod
795
+ - N
796
+ - M
797
+ - E
798
+ - list
799
+ - music_album
800
+ - dio
801
+ - ▁transport
802
+ - qa_query
803
+ - C
804
+ - O
805
+ - U
806
+ - query_detail
807
+ - ']'
808
+ - '['
809
+ - descriptor
810
+ - ':'
811
+ - spon
812
+ - <sos/eos>
813
+ init: null
814
+ input_size: null
815
+ ctc_conf:
816
+ dropout_rate: 0.0
817
+ ctc_type: builtin
818
+ reduce: true
819
+ ignore_nan_grad: null
820
+ zero_infinity: true
821
+ brctc_risk_strategy: exp
822
+ brctc_group_strategy: end
823
+ brctc_risk_factor: 0.0
824
+ joint_net_conf: null
825
+ use_preprocessor: true
826
+ use_lang_prompt: false
827
+ use_nlp_prompt: false
828
+ token_type: word
829
+ bpemodel: null
830
+ non_linguistic_symbols: null
831
+ cleaner: null
832
+ g2p: null
833
+ speech_volume_normalize: null
834
+ rir_scp: null
835
+ rir_apply_prob: 1.0
836
+ noise_scp: null
837
+ noise_apply_prob: 1.0
838
+ noise_db_range: '13_15'
839
+ short_noise_thres: 0.5
840
+ aux_ctc_tasks: []
841
+ frontend: default
842
+ frontend_conf:
843
+ fs: 16k
844
+ specaug: specaug
845
+ specaug_conf:
846
+ apply_time_warp: true
847
+ time_warp_window: 5
848
+ time_warp_mode: bicubic
849
+ apply_freq_mask: true
850
+ freq_mask_width_range:
851
+ - 0
852
+ - 30
853
+ num_freq_mask: 2
854
+ apply_time_mask: true
855
+ time_mask_width_range:
856
+ - 0
857
+ - 40
858
+ num_time_mask: 2
859
+ normalize: utterance_mvn
860
+ normalize_conf: {}
861
+ model: espnet
862
+ model_conf:
863
+ ctc_weight: 0.3
864
+ lsm_weight: 0.1
865
+ length_normalized_loss: false
866
+ extract_feats_in_collect_stats: false
867
+ preencoder: null
868
+ preencoder_conf: {}
869
+ encoder: multiconv_conformer
870
+ encoder_conf:
871
+ output_size: 512
872
+ attention_heads: 8
873
+ selfattention_layer_type: rel_selfattn
874
+ pos_enc_layer_type: rel_pos
875
+ rel_pos_type: latest
876
+ cgmlp_linear_units: 3072
877
+ multicgmlp_type: concat_fusion
878
+ multicgmlp_kernel_sizes: 7,15,23,31
879
+ multicgmlp_merge_conv_kernel: 31
880
+ use_linear_after_conv: false
881
+ gate_activation: identity
882
+ num_blocks: 12
883
+ dropout_rate: 0.1
884
+ positional_dropout_rate: 0.1
885
+ attention_dropout_rate: 0.1
886
+ input_layer: conv2d
887
+ layer_drop_rate: 0.1
888
+ linear_units: 1152
889
+ positionwise_layer_type: linear
890
+ macaron_style: true
891
+ use_cnn_module: true
892
+ postencoder: null
893
+ postencoder_conf: {}
894
+ decoder: transformer
895
+ decoder_conf:
896
+ attention_heads: 8
897
+ linear_units: 2048
898
+ num_blocks: 6
899
+ dropout_rate: 0.1
900
+ positional_dropout_rate: 0.1
901
+ self_attention_dropout_rate: 0.1
902
+ src_attention_dropout_rate: 0.1
903
+ layer_drop_rate: 0.2
904
+ preprocessor: default
905
+ preprocessor_conf: {}
906
+ required:
907
+ - output_dir
908
+ - token_list
909
+ version: '202310'
910
+ distributed: true
911
+ ```
912
+
913
+ </details>
914
+
915
+
916
+
917
+ ### Citing ESPnet
918
+
919
+ ```BibTex
920
+ @inproceedings{watanabe2018espnet,
921
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
922
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
923
+ year={2018},
924
+ booktitle={Proceedings of Interspeech},
925
+ pages={2207--2211},
926
+ doi={10.21437/Interspeech.2018-1456},
927
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
928
+ }
929
+
930
+
931
+
932
+
933
+
934
+
935
+ ```
936
+
937
+ or arXiv:
938
+
939
+ ```bibtex
940
+ @misc{watanabe2018espnet,
941
+ title={ESPnet: End-to-End Speech Processing Toolkit},
942
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
943
+ year={2018},
944
+ eprint={1804.00015},
945
+ archivePrefix={arXiv},
946
+ primaryClass={cs.CL}
947
+ }
948
+ ```
exp/slurp_multiconvformer_conv_fusion/RESULTS.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 21 01:04:03 EST 2024`
5
+ - python version: `3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202310`
7
+ - pytorch version: `pytorch 2.1.2+cu118`
8
+ - Git hash: `edb6ec64bb5d4f2c68a3b81674f0c2822e2e5b58`
9
+ - Commit date: `Fri Feb 9 21:26:35 2024 +0530`
10
+
11
+ ## exp/slurp_multiconvformer_conv_fusion
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|84.1|7.5|8.5|2.7|18.7|47.0|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|90.0|3.0|7.0|3.1|13.1|47.0|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ ## exp/slurp_multiconvformer_conv_fusion/decode_asr_asr_model_valid.acc.ave_10best
29
+ ### WER
30
+
31
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
32
+ |---|---|---|---|---|---|---|---|---|
33
+ |org/devel|8690|178058|84.9|7.4|7.7|3.1|18.1|48.7|
34
+
35
+ ### CER
36
+
37
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
38
+ |---|---|---|---|---|---|---|---|---|
39
+ |org/devel|8690|847400|90.9|2.9|6.2|3.4|12.5|48.7|
40
+
41
+ ### TER
42
+
43
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
44
+ |---|---|---|---|---|---|---|---|---|
exp/slurp_multiconvformer_conv_fusion/config.yaml ADDED
@@ -0,0 +1,830 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_multiconv_e12_mlp3072_linear2048_layerdrop.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/slurp_multiconvformer_conv_fusion
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 2
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 40439
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 60
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - acc
41
+ - max
42
+ keep_nbest_models: 10
43
+ nbest_averaging_interval: 0
44
+ grad_clip: 5.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 1
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: false
52
+ log_interval: null
53
+ use_matplotlib: true
54
+ use_tensorboard: true
55
+ create_graph_in_tensorboard: false
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ use_lora: false
64
+ save_lora_only: true
65
+ lora_conf: {}
66
+ pretrain_path: null
67
+ init_param: []
68
+ ignore_init_mismatch: false
69
+ freeze_param: []
70
+ num_iters_per_epoch: null
71
+ batch_size: 64
72
+ valid_batch_size: null
73
+ batch_bins: 1000000
74
+ valid_batch_bins: null
75
+ train_shape_file:
76
+ - exp/asr_stats_raw_en_word/train/speech_shape
77
+ - exp/asr_stats_raw_en_word/train/text_shape.word
78
+ valid_shape_file:
79
+ - exp/asr_stats_raw_en_word/valid/speech_shape
80
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
81
+ batch_type: folded
82
+ valid_batch_type: null
83
+ fold_length:
84
+ - 80000
85
+ - 150
86
+ sort_in_batch: descending
87
+ shuffle_within_batch: false
88
+ sort_batch: descending
89
+ multiple_iterator: false
90
+ chunk_length: 500
91
+ chunk_shift_ratio: 0.5
92
+ num_cache_chunks: 1024
93
+ chunk_excluded_key_prefixes: []
94
+ chunk_default_fs: null
95
+ train_data_path_and_name_and_type:
96
+ - - dump/raw/train/wav.scp
97
+ - speech
98
+ - kaldi_ark
99
+ - - dump/raw/train/text
100
+ - text
101
+ - text
102
+ valid_data_path_and_name_and_type:
103
+ - - dump/raw/devel/wav.scp
104
+ - speech
105
+ - kaldi_ark
106
+ - - dump/raw/devel/text
107
+ - text
108
+ - text
109
+ allow_variable_data_keys: false
110
+ max_cache_size: 0.0
111
+ max_cache_fd: 32
112
+ allow_multi_rates: false
113
+ valid_max_cache_size: null
114
+ exclude_weight_decay: false
115
+ exclude_weight_decay_conf: {}
116
+ optim: adam
117
+ optim_conf:
118
+ lr: 0.001
119
+ weight_decay: 1.0e-06
120
+ scheduler: warmuplr
121
+ scheduler_conf:
122
+ warmup_steps: 35000
123
+ token_list:
124
+ - <blank>
125
+ - <unk>
126
+ - ▁SEP
127
+ - ▁FILL
128
+ - s
129
+ - ▁the
130
+ - a
131
+ - ▁to
132
+ - ▁i
133
+ - ▁me
134
+ - e
135
+ - ▁s
136
+ - ▁a
137
+ - i
138
+ - ▁you
139
+ - ▁what
140
+ - er
141
+ - ing
142
+ - u
143
+ - ▁is
144
+ - ''''
145
+ - o
146
+ - p
147
+ - ▁in
148
+ - ▁p
149
+ - y
150
+ - ▁my
151
+ - ▁please
152
+ - d
153
+ - c
154
+ - m
155
+ - ▁b
156
+ - l
157
+ - ▁m
158
+ - ▁c
159
+ - st
160
+ - date
161
+ - n
162
+ - ▁d
163
+ - le
164
+ - b
165
+ - ▁for
166
+ - re
167
+ - t
168
+ - ▁on
169
+ - en
170
+ - h
171
+ - 'on'
172
+ - ar
173
+ - person
174
+ - ▁re
175
+ - ▁f
176
+ - ▁g
177
+ - ▁of
178
+ - an
179
+ - ▁
180
+ - g
181
+ - ▁today
182
+ - ▁t
183
+ - or
184
+ - ▁it
185
+ - ▁this
186
+ - ▁h
187
+ - r
188
+ - f
189
+ - at
190
+ - ch
191
+ - ce
192
+ - place_name
193
+ - ▁email
194
+ - ▁do
195
+ - es
196
+ - ri
197
+ - ▁e
198
+ - ▁w
199
+ - ic
200
+ - in
201
+ - ▁that
202
+ - event_name
203
+ - ▁play
204
+ - ▁and
205
+ - al
206
+ - ▁n
207
+ - ▁can
208
+ - email_query
209
+ - ve
210
+ - ▁new
211
+ - day
212
+ - it
213
+ - ate
214
+ - ▁from
215
+ - ▁have
216
+ - k
217
+ - time
218
+ - ▁am
219
+ - media_type
220
+ - email_sendemail
221
+ - ent
222
+ - ▁olly
223
+ - qa_factoid
224
+ - se
225
+ - v
226
+ - et
227
+ - ck
228
+ - ▁any
229
+ - calendar_set
230
+ - ly
231
+ - th
232
+ - ▁how
233
+ - ▁meeting
234
+ - ed
235
+ - ▁tell
236
+ - ▁st
237
+ - x
238
+ - ur
239
+ - ro
240
+ - ▁at
241
+ - nd
242
+ - ▁list
243
+ - w
244
+ - ▁u
245
+ - ou
246
+ - ▁not
247
+ - ▁about
248
+ - ▁an
249
+ - ▁o
250
+ - general_negate
251
+ - ut
252
+ - ▁time
253
+ - ▁be
254
+ - ▁ch
255
+ - ▁are
256
+ - social_post
257
+ - business_name
258
+ - la
259
+ - ty
260
+ - play_music
261
+ - ot
262
+ - general_quirky
263
+ - ▁l
264
+ - ▁sh
265
+ - ▁tweet
266
+ - om
267
+ - ▁week
268
+ - um
269
+ - ▁one
270
+ - ter
271
+ - ▁he
272
+ - ▁up
273
+ - ▁com
274
+ - general_praise
275
+ - weather_query
276
+ - ▁next
277
+ - ▁th
278
+ - ▁check
279
+ - calendar_query
280
+ - ▁last
281
+ - ▁ro
282
+ - ad
283
+ - is
284
+ - ▁with
285
+ - ay
286
+ - ▁send
287
+ - pe
288
+ - ▁pm
289
+ - ▁tomorrow
290
+ - ▁j
291
+ - un
292
+ - ▁train
293
+ - general_explain
294
+ - ▁v
295
+ - one
296
+ - ▁r
297
+ - ra
298
+ - news_query
299
+ - ation
300
+ - ▁emails
301
+ - us
302
+ - if
303
+ - ct
304
+ - ▁co
305
+ - ▁add
306
+ - ▁will
307
+ - ▁se
308
+ - nt
309
+ - ▁was
310
+ - ine
311
+ - ▁de
312
+ - ▁set
313
+ - ▁ex
314
+ - ▁would
315
+ - ir
316
+ - ow
317
+ - ber
318
+ - general_repeat
319
+ - ight
320
+ - ook
321
+ - ▁again
322
+ - ▁song
323
+ - currency_name
324
+ - ll
325
+ - ▁ha
326
+ - ▁go
327
+ - relation
328
+ - te
329
+ - ion
330
+ - and
331
+ - ▁y
332
+ - ▁ye
333
+ - general_affirm
334
+ - general_confirm
335
+ - ery
336
+ - ▁po
337
+ - ff
338
+ - ▁we
339
+ - ▁turn
340
+ - ▁did
341
+ - ▁mar
342
+ - ▁alarm
343
+ - ▁like
344
+ - datetime_query
345
+ - ers
346
+ - ▁all
347
+ - ▁remind
348
+ - ▁so
349
+ - qa_definition
350
+ - ▁calendar
351
+ - end
352
+ - ▁said
353
+ - ci
354
+ - ▁off
355
+ - ▁john
356
+ - ▁day
357
+ - ss
358
+ - pla
359
+ - ume
360
+ - ▁get
361
+ - ail
362
+ - pp
363
+ - z
364
+ - ry
365
+ - am
366
+ - ▁need
367
+ - as
368
+ - ▁thank
369
+ - ▁wh
370
+ - ▁want
371
+ - ▁right
372
+ - ▁jo
373
+ - ▁facebook
374
+ - ▁k
375
+ - ge
376
+ - ld
377
+ - ▁fri
378
+ - ▁two
379
+ - general_dontcare
380
+ - ▁news
381
+ - ol
382
+ - oo
383
+ - ant
384
+ - ▁five
385
+ - ▁event
386
+ - ake
387
+ - definition_word
388
+ - transport_type
389
+ - ▁your
390
+ - vi
391
+ - orn
392
+ - op
393
+ - ▁weather
394
+ - ome
395
+ - ▁app
396
+ - ▁lo
397
+ - de
398
+ - ▁music
399
+ - weather_descriptor
400
+ - ak
401
+ - ke
402
+ - ▁there
403
+ - ▁si
404
+ - ▁lights
405
+ - ▁now
406
+ - ▁mo
407
+ - calendar_remove
408
+ - our
409
+ - ▁dollar
410
+ - food_type
411
+ - me
412
+ - ▁more
413
+ - ▁no
414
+ - ▁birthday
415
+ - orrect
416
+ - ▁rep
417
+ - ▁show
418
+ - play_radio
419
+ - ▁mon
420
+ - ▁does
421
+ - ood
422
+ - ag
423
+ - li
424
+ - ▁sto
425
+ - ▁contact
426
+ - cket
427
+ - email_querycontact
428
+ - ▁ev
429
+ - ▁could
430
+ - ange
431
+ - ▁just
432
+ - out
433
+ - ame
434
+ - .
435
+ - ▁ja
436
+ - ▁confirm
437
+ - qa_currency
438
+ - ▁man
439
+ - ▁late
440
+ - ▁think
441
+ - ▁some
442
+ - timeofday
443
+ - ▁bo
444
+ - qa_stock
445
+ - ong
446
+ - ▁start
447
+ - ▁work
448
+ - ▁ten
449
+ - int
450
+ - ▁command
451
+ - all
452
+ - ▁make
453
+ - ▁la
454
+ - j
455
+ - ▁answ
456
+ - ▁hour
457
+ - ▁cle
458
+ - ah
459
+ - ▁find
460
+ - ▁service
461
+ - ▁fa
462
+ - qu
463
+ - general_commandstop
464
+ - ai
465
+ - ▁when
466
+ - ▁te
467
+ - ▁by
468
+ - social_query
469
+ - ard
470
+ - ▁tw
471
+ - ul
472
+ - id
473
+ - ▁seven
474
+ - ▁where
475
+ - ▁much
476
+ - art
477
+ - ▁appointment
478
+ - ver
479
+ - artist_name
480
+ - el
481
+ - device_type
482
+ - ▁know
483
+ - ▁three
484
+ - ▁events
485
+ - ▁tr
486
+ - ▁li
487
+ - ork
488
+ - red
489
+ - ect
490
+ - ▁let
491
+ - ▁respon
492
+ - ▁par
493
+ - zz
494
+ - ▁give
495
+ - ▁twenty
496
+ - ▁ti
497
+ - ▁curre
498
+ - play_podcasts
499
+ - ▁radio
500
+ - cooking_recipe
501
+ - transport_query
502
+ - ▁con
503
+ - gh
504
+ - ▁le
505
+ - lists_query
506
+ - ▁rem
507
+ - recommendation_events
508
+ - house_place
509
+ - alarm_set
510
+ - play_audiobook
511
+ - ist
512
+ - ase
513
+ - music_genre
514
+ - ive
515
+ - ast
516
+ - player_setting
517
+ - ort
518
+ - lly
519
+ - news_topic
520
+ - list_name
521
+ - ▁playlist
522
+ - ▁ne
523
+ - business_type
524
+ - personal_info
525
+ - ind
526
+ - ust
527
+ - di
528
+ - ress
529
+ - recommendation_locations
530
+ - lists_createoradd
531
+ - iot_hue_lightoff
532
+ - lists_remove
533
+ - ord
534
+ - ▁light
535
+ - ere
536
+ - alarm_query
537
+ - audio_volume_mute
538
+ - music_query
539
+ - ▁audio
540
+ - rain
541
+ - ▁date
542
+ - ▁order
543
+ - audio_volume_up
544
+ - ▁ar
545
+ - ▁podcast
546
+ - transport_ticket
547
+ - mail
548
+ - iot_hue_lightchange
549
+ - iot_coffee
550
+ - radio_name
551
+ - ill
552
+ - ▁ri
553
+ - '@'
554
+ - takeaway_query
555
+ - song_name
556
+ - takeaway_order
557
+ - ▁ra
558
+ - email_addcontact
559
+ - play_game
560
+ - book
561
+ - transport_traffic
562
+ - ▁house
563
+ - music_likeness
564
+ - her
565
+ - transport_taxi
566
+ - iot_hue_lightdim
567
+ - ment
568
+ - ght
569
+ - fo
570
+ - order_type
571
+ - color_type
572
+ - '1'
573
+ - ven
574
+ - ould
575
+ - general_joke
576
+ - ess
577
+ - ain
578
+ - qa_maths
579
+ - ▁place
580
+ - ▁twe
581
+ - cast
582
+ - iot_cleaning
583
+ - ▁che
584
+ - ▁cont
585
+ - ith
586
+ - audiobook_name
587
+ - email_address
588
+ - game_name
589
+ - ▁cal
590
+ - general_frequency
591
+ - ▁tom
592
+ - ▁food
593
+ - act
594
+ - iot_hue_lightup
595
+ - '2'
596
+ - alarm_remove
597
+ - podcast_descriptor
598
+ - ▁definition
599
+ - audio_volume_down
600
+ - ▁media
601
+ - email_folder
602
+ - dia
603
+ - meal_type
604
+ - ▁mus
605
+ - recommendation_movies
606
+ - ▁ad
607
+ - ree
608
+ - pt
609
+ - now
610
+ - playlist_name
611
+ - ▁person
612
+ - change_amount
613
+ - ▁pla
614
+ - escri
615
+ - datetime_convert
616
+ - podcast_name
617
+ - ▁ab
618
+ - time_zone
619
+ - ▁def
620
+ - ting
621
+ - iot_wemo_on
622
+ - music_settings
623
+ - iot_wemo_off
624
+ - orre
625
+ - cy
626
+ - ank
627
+ - music_descriptor
628
+ - lar
629
+ - app_name
630
+ - row
631
+ - joke_type
632
+ - xt
633
+ - of
634
+ - ition
635
+ - ▁meet
636
+ - ink
637
+ - ▁confir
638
+ - transport_agency
639
+ - general_greet
640
+ - ▁business
641
+ - ▁art
642
+ - ▁ag
643
+ - urn
644
+ - escript
645
+ - rom
646
+ - ▁rel
647
+ - ▁au
648
+ - ▁currency
649
+ - audio_volume_other
650
+ - iot_hue_lighton
651
+ - ▁artist
652
+ - '?'
653
+ - ▁bus
654
+ - cooking_type
655
+ - movie_name
656
+ - coffee_type
657
+ - ingredient
658
+ - ather
659
+ - music_dislikeness
660
+ - sp
661
+ - q
662
+ - ▁ser
663
+ - esc
664
+ - ▁bir
665
+ - ▁cur
666
+ - name
667
+ - ▁tran
668
+ - ▁hou
669
+ - ek
670
+ - uch
671
+ - ▁conf
672
+ - ▁face
673
+ - '9'
674
+ - ▁birth
675
+ - I
676
+ - sw
677
+ - transport_descriptor
678
+ - ▁comm
679
+ - lease
680
+ - transport_name
681
+ - aid
682
+ - movie_type
683
+ - ▁device
684
+ - alarm_type
685
+ - audiobook_author
686
+ - '5'
687
+ - drink_type
688
+ - ▁joh
689
+ - ▁defin
690
+ - word
691
+ - ▁curren
692
+ - order
693
+ - iness
694
+ - W
695
+ - cooking_query
696
+ - sport_type
697
+ - ▁relation
698
+ - oint
699
+ - H
700
+ - '8'
701
+ - A
702
+ - '0'
703
+ - ▁dol
704
+ - vice
705
+ - ▁pers
706
+ - '&'
707
+ - T
708
+ - ▁appoint
709
+ - _
710
+ - '7'
711
+ - '3'
712
+ - '-'
713
+ - game_type
714
+ - ▁pod
715
+ - N
716
+ - M
717
+ - E
718
+ - list
719
+ - music_album
720
+ - dio
721
+ - ▁transport
722
+ - qa_query
723
+ - C
724
+ - O
725
+ - U
726
+ - query_detail
727
+ - ']'
728
+ - '['
729
+ - descriptor
730
+ - ':'
731
+ - spon
732
+ - <sos/eos>
733
+ init: null
734
+ input_size: null
735
+ ctc_conf:
736
+ dropout_rate: 0.0
737
+ ctc_type: builtin
738
+ reduce: true
739
+ ignore_nan_grad: null
740
+ zero_infinity: true
741
+ brctc_risk_strategy: exp
742
+ brctc_group_strategy: end
743
+ brctc_risk_factor: 0.0
744
+ joint_net_conf: null
745
+ use_preprocessor: true
746
+ use_lang_prompt: false
747
+ use_nlp_prompt: false
748
+ token_type: word
749
+ bpemodel: null
750
+ non_linguistic_symbols: null
751
+ cleaner: null
752
+ g2p: null
753
+ speech_volume_normalize: null
754
+ rir_scp: null
755
+ rir_apply_prob: 1.0
756
+ noise_scp: null
757
+ noise_apply_prob: 1.0
758
+ noise_db_range: '13_15'
759
+ short_noise_thres: 0.5
760
+ aux_ctc_tasks: []
761
+ frontend: default
762
+ frontend_conf:
763
+ fs: 16k
764
+ specaug: specaug
765
+ specaug_conf:
766
+ apply_time_warp: true
767
+ time_warp_window: 5
768
+ time_warp_mode: bicubic
769
+ apply_freq_mask: true
770
+ freq_mask_width_range:
771
+ - 0
772
+ - 30
773
+ num_freq_mask: 2
774
+ apply_time_mask: true
775
+ time_mask_width_range:
776
+ - 0
777
+ - 40
778
+ num_time_mask: 2
779
+ normalize: utterance_mvn
780
+ normalize_conf: {}
781
+ model: espnet
782
+ model_conf:
783
+ ctc_weight: 0.3
784
+ lsm_weight: 0.1
785
+ length_normalized_loss: false
786
+ extract_feats_in_collect_stats: false
787
+ preencoder: null
788
+ preencoder_conf: {}
789
+ encoder: multiconv_conformer
790
+ encoder_conf:
791
+ output_size: 512
792
+ attention_heads: 8
793
+ selfattention_layer_type: rel_selfattn
794
+ pos_enc_layer_type: rel_pos
795
+ rel_pos_type: latest
796
+ cgmlp_linear_units: 3072
797
+ multicgmlp_type: concat_fusion
798
+ multicgmlp_kernel_sizes: 7,15,23,31
799
+ multicgmlp_merge_conv_kernel: 31
800
+ use_linear_after_conv: false
801
+ gate_activation: identity
802
+ num_blocks: 12
803
+ dropout_rate: 0.1
804
+ positional_dropout_rate: 0.1
805
+ attention_dropout_rate: 0.1
806
+ input_layer: conv2d
807
+ layer_drop_rate: 0.1
808
+ linear_units: 1152
809
+ positionwise_layer_type: linear
810
+ macaron_style: true
811
+ use_cnn_module: true
812
+ postencoder: null
813
+ postencoder_conf: {}
814
+ decoder: transformer
815
+ decoder_conf:
816
+ attention_heads: 8
817
+ linear_units: 2048
818
+ num_blocks: 6
819
+ dropout_rate: 0.1
820
+ positional_dropout_rate: 0.1
821
+ self_attention_dropout_rate: 0.1
822
+ src_attention_dropout_rate: 0.1
823
+ layer_drop_rate: 0.2
824
+ preprocessor: default
825
+ preprocessor_conf: {}
826
+ required:
827
+ - output_dir
828
+ - token_list
829
+ version: '202310'
830
+ distributed: true
exp/slurp_multiconvformer_conv_fusion/images/acc.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/backward_time.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/cer.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/cer_ctc.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/clip.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/forward_time.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/gpu_max_cached_mem_GB.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/grad_norm.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/iter_time.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/loss.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/loss_att.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/loss_ctc.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/loss_scale.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/optim0_lr0.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/optim_step_time.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/train_time.png ADDED
exp/slurp_multiconvformer_conv_fusion/images/wer.png ADDED
exp/slurp_multiconvformer_conv_fusion/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61dbb0c661861464e6e7e06eba3ca29137c6f2805e20a1f0b2a031951082530c
3
+ size 432710206
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202310'
2
+ files:
3
+ asr_model_file: exp/slurp_multiconvformer_conv_fusion/valid.acc.ave_10best.pth
4
+ python: "3.9.18 (main, Sep 11 2023, 13:41:44) \n[GCC 11.2.0]"
5
+ timestamp: 1719934095.861915
6
+ torch: 2.1.2+cu118
7
+ yaml_files:
8
+ asr_train_config: exp/slurp_multiconvformer_conv_fusion/config.yaml