roshansh-cmu commited on
Commit
257f991
1 Parent(s): 40864c7

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,1296 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-summarization
6
+ language: en
7
+ datasets:
8
+ - how2
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/roshansh_how2_asr_raw_ft_sum_valid.acc`
15
+
16
+ This model was trained by roshansh-cmu using how2 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout e6f42a9783a5d9eba0687c19417f933e890722d7
23
+ pip install -e .
24
+ cd egs2/how2/sum1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/roshansh_how2_asr_raw_ft_sum_valid.acc
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Mon Feb 7 15:24:21 EST 2022`
32
+ - python version: `3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.6a1`
34
+ - pytorch version: `pytorch 1.10.1`
35
+ - Git hash: `04561cdf3b6c3bc1d51edb04c93b953759ef551d`
36
+ - Commit date: `Mon Feb 7 09:06:12 2022 -0500`
37
+
38
+ ## asr_raw_ft_sum
39
+ |dataset|ROUGE-1|ROUGE-2|ROUGE-L|METEOR|BERTScore|
40
+ |---|---|---|---|---|---|
41
+
42
+ ## ASR config
43
+
44
+ <details><summary>expand</summary>
45
+
46
+ ```
47
+ config: conf/train_asr_conformer_vid_lf.yaml
48
+ print_config: false
49
+ log_level: INFO
50
+ dry_run: false
51
+ iterator_type: sequence
52
+ output_dir: exp/asr_raw_ft_sum
53
+ ngpu: 1
54
+ seed: 0
55
+ num_workers: 1
56
+ num_att_plot: 3
57
+ dist_backend: nccl
58
+ dist_init_method: env://
59
+ dist_world_size: 8
60
+ dist_rank: 0
61
+ local_rank: 0
62
+ dist_master_addr: localhost
63
+ dist_master_port: 45875
64
+ dist_launcher: null
65
+ multiprocessing_distributed: true
66
+ unused_parameters: true
67
+ sharded_ddp: false
68
+ cudnn_enabled: true
69
+ cudnn_benchmark: false
70
+ cudnn_deterministic: true
71
+ collect_stats: false
72
+ write_collected_feats: false
73
+ max_epoch: 100
74
+ patience: 10
75
+ val_scheduler_criterion:
76
+ - valid
77
+ - loss
78
+ early_stopping_criterion:
79
+ - valid
80
+ - loss
81
+ - min
82
+ best_model_criterion:
83
+ - - valid
84
+ - acc
85
+ - max
86
+ keep_nbest_models: 10
87
+ grad_clip: 5.0
88
+ grad_clip_type: 2.0
89
+ grad_noise: false
90
+ accum_grad: 10
91
+ no_forward_run: false
92
+ resume: true
93
+ train_dtype: float32
94
+ use_amp: false
95
+ log_interval: 5000
96
+ use_tensorboard: true
97
+ use_wandb: false
98
+ wandb_project: null
99
+ wandb_id: null
100
+ wandb_entity: null
101
+ wandb_name: null
102
+ wandb_model_log_interval: -1
103
+ detect_anomaly: false
104
+ pretrain_path: null
105
+ init_param:
106
+ - exp/asr_raw_utt_conformer/valid.acc.ave_10best.pth:::ctc
107
+ ignore_init_mismatch: false
108
+ freeze_param: []
109
+ num_iters_per_epoch: null
110
+ batch_size: 20
111
+ valid_batch_size: null
112
+ batch_bins: 60000000
113
+ valid_batch_bins: null
114
+ train_shape_file:
115
+ - exp/asr_stats_raw_vid_sum/train/speech_shape
116
+ - exp/asr_stats_raw_vid_sum/train/text_shape.bpe
117
+ valid_shape_file:
118
+ - exp/asr_stats_raw_vid_sum/valid/speech_shape
119
+ - exp/asr_stats_raw_vid_sum/valid/text_shape.bpe
120
+ batch_type: length
121
+ valid_batch_type: null
122
+ fold_length:
123
+ - 80000
124
+ - 150
125
+ sort_in_batch: descending
126
+ sort_batch: descending
127
+ multiple_iterator: false
128
+ chunk_length: 500
129
+ chunk_shift_ratio: 0.5
130
+ num_cache_chunks: 1024
131
+ train_data_path_and_name_and_type:
132
+ - - dump/raw/tr_2000h_sum_trim/wav.scp
133
+ - speech
134
+ - sound
135
+ - - dump/raw/tr_2000h_sum_trim/text
136
+ - text
137
+ - text
138
+ valid_data_path_and_name_and_type:
139
+ - - dump/raw/cv05_sum_trim/wav.scp
140
+ - speech
141
+ - sound
142
+ - - dump/raw/cv05_sum_trim/text
143
+ - text
144
+ - text
145
+ allow_variable_data_keys: false
146
+ max_cache_size: 0.0
147
+ max_cache_fd: 32
148
+ valid_max_cache_size: null
149
+ optim: adam
150
+ optim_conf:
151
+ lr: 0.001
152
+ scheduler: reducelronplateau
153
+ scheduler_conf:
154
+ mode: min
155
+ factor: 0.5
156
+ patience: 1
157
+ token_list:
158
+ - <blank>
159
+ - <unk>
160
+ - '[hes]'
161
+ - S
162
+ - ▁THE
163
+ - ▁TO
164
+ - ''''
165
+ - ▁AND
166
+ - ▁YOU
167
+ - ▁A
168
+ - ▁IT
169
+ - T
170
+ - ▁THAT
171
+ - ▁OF
172
+ - ▁I
173
+ - ▁IS
174
+ - RE
175
+ - ▁IN
176
+ - ING
177
+ - ▁WE
178
+ - M
179
+ - ▁GOING
180
+ - ▁SO
181
+ - ▁THIS
182
+ - ▁YOUR
183
+ - ▁ON
184
+ - E
185
+ - D
186
+ - ▁BE
187
+ - ▁CAN
188
+ - N
189
+ - Y
190
+ - O
191
+ - ER
192
+ - ▁HAVE
193
+ - ▁JUST
194
+ - ▁FOR
195
+ - ▁WITH
196
+ - ▁DO
197
+ - ED
198
+ - ▁ARE
199
+ - ▁WANT
200
+ - ▁UP
201
+ - R
202
+ - LL
203
+ - P
204
+ - ▁
205
+ - L
206
+ - B
207
+ - ▁IF
208
+ - C
209
+ - ▁ONE
210
+ - ▁S
211
+ - ▁OR
212
+ - A
213
+ - ▁GO
214
+ - ▁LIKE
215
+ - ▁NOW
216
+ - ▁HERE
217
+ - VE
218
+ - LE
219
+ - U
220
+ - ▁GET
221
+ - ▁WHAT
222
+ - ▁OUT
223
+ - IN
224
+ - W
225
+ - ▁C
226
+ - ▁LITTLE
227
+ - ▁THERE
228
+ - LY
229
+ - ▁AS
230
+ - ▁MAKE
231
+ - I
232
+ - ▁THEY
233
+ - ▁MY
234
+ - K
235
+ - ▁THEN
236
+ - ▁BUT
237
+ - AL
238
+ - G
239
+ - ▁ALL
240
+ - OR
241
+ - ▁BACK
242
+ - ▁NOT
243
+ - ▁ABOUT
244
+ - ▁RIGHT
245
+ - ▁OUR
246
+ - EN
247
+ - ▁SOME
248
+ - ▁DOWN
249
+ - F
250
+ - ▁WHEN
251
+ - CH
252
+ - ▁F
253
+ - ▁HOW
254
+ - AR
255
+ - ▁WILL
256
+ - ▁RE
257
+ - CK
258
+ - ▁G
259
+ - ES
260
+ - CE
261
+ - ▁TAKE
262
+ - ▁AT
263
+ - ▁FROM
264
+ - ▁WAY
265
+ - TER
266
+ - ▁SEE
267
+ - RA
268
+ - ▁USE
269
+ - ▁REALLY
270
+ - RI
271
+ - TH
272
+ - ▁TWO
273
+ - ▁ME
274
+ - ▁VERY
275
+ - ▁E
276
+ - ▁B
277
+ - AT
278
+ - ▁THEM
279
+ - ▁DON
280
+ - ▁AN
281
+ - ▁BECAUSE
282
+ - ▁MORE
283
+ - RO
284
+ - H
285
+ - 'ON'
286
+ - LI
287
+ - ▁PUT
288
+ - ▁ST
289
+ - IL
290
+ - ▁BIT
291
+ - ▁START
292
+ - ▁NEED
293
+ - ▁INTO
294
+ - UR
295
+ - ▁TIME
296
+ - ▁OVER
297
+ - ▁W
298
+ - ▁DE
299
+ - ▁LOOK
300
+ - ▁THESE
301
+ - ▁LET
302
+ - ▁GOOD
303
+ - ▁ALSO
304
+ - AN
305
+ - ▁OFF
306
+ - ▁HE
307
+ - ▁KIND
308
+ - ▁SIDE
309
+ - ▁CO
310
+ - ▁SURE
311
+ - ▁AGAIN
312
+ - ▁MA
313
+ - ▁KNOW
314
+ - IT
315
+ - ▁WOULD
316
+ - IC
317
+ - ▁OTHER
318
+ - LA
319
+ - ▁P
320
+ - ▁WHICH
321
+ - '-'
322
+ - IR
323
+ - ▁LA
324
+ - ▁HAND
325
+ - EL
326
+ - ▁LOT
327
+ - ▁WHERE
328
+ - ▁THREE
329
+ - ▁PA
330
+ - ION
331
+ - LO
332
+ - ▁KEEP
333
+ - ▁SHOW
334
+ - ▁THING
335
+ - ▁FIRST
336
+ - TE
337
+ - ENT
338
+ - ATE
339
+ - ▁COME
340
+ - AD
341
+ - ▁GOT
342
+ - NG
343
+ - ▁NICE
344
+ - ▁T
345
+ - ET
346
+ - ▁MO
347
+ - ▁ANY
348
+ - ▁ACTUALLY
349
+ - ▁DIFFERENT
350
+ - ▁SE
351
+ - GE
352
+ - ▁WORK
353
+ - ▁THROUGH
354
+ - ▁O
355
+ - KE
356
+ - V
357
+ - ▁AROUND
358
+ - ▁BA
359
+ - PE
360
+ - ▁HI
361
+ - ▁BY
362
+ - SH
363
+ - ATION
364
+ - ▁SU
365
+ - ▁CA
366
+ - ▁D
367
+ - ▁LO
368
+ - ▁HAS
369
+ - ▁LI
370
+ - ▁PLAY
371
+ - Z
372
+ - ▁ADD
373
+ - ▁RO
374
+ - ▁TA
375
+ - AS
376
+ - ▁FOUR
377
+ - ▁CON
378
+ - ▁THOSE
379
+ - MP
380
+ - NE
381
+ - ▁SP
382
+ - UT
383
+ - ▁GIVE
384
+ - ▁WELL
385
+ - ▁BALL
386
+ - TING
387
+ - RY
388
+ - X
389
+ - ▁HO
390
+ - INE
391
+ - IVE
392
+ - ▁NEXT
393
+ - ▁PO
394
+ - ▁STEP
395
+ - ▁EVEN
396
+ - TION
397
+ - ▁MI
398
+ - MENT
399
+ - ▁CUT
400
+ - ▁BO
401
+ - ▁LINE
402
+ - ▁MUCH
403
+ - ▁THINGS
404
+ - ▁TALK
405
+ - UN
406
+ - ▁PART
407
+ - ▁WAS
408
+ - ▁FA
409
+ - ▁SOMETHING
410
+ - PP
411
+ - ANCE
412
+ - ND
413
+ - DI
414
+ - ▁RA
415
+ - AGE
416
+ - ▁SAME
417
+ - ▁EXPERT
418
+ - ▁DOING
419
+ - ▁LEFT
420
+ - IST
421
+ - ▁DI
422
+ - ▁NO
423
+ - RU
424
+ - ME
425
+ - TA
426
+ - UL
427
+ - TI
428
+ - ▁VILLAGE
429
+ - DE
430
+ - ERS
431
+ - ▁PEOPLE
432
+ - ▁TURN
433
+ - VER
434
+ - ▁FL
435
+ - ▁LEG
436
+ - ▁ONCE
437
+ - ▁COLOR
438
+ - ▁PULL
439
+ - ▁USING
440
+ - VI
441
+ - ▁WATER
442
+ - ▁SHE
443
+ - ▁TOP
444
+ - ▁OKAY
445
+ - ▁ANOTHER
446
+ - ▁THEIR
447
+ - ▁SAY
448
+ - URE
449
+ - ▁HA
450
+ - ▁IMPORTANT
451
+ - ▁PIECE
452
+ - ▁FOOT
453
+ - ▁TRA
454
+ - ▁SC
455
+ - ▁BODY
456
+ - ▁SET
457
+ - ▁POINT
458
+ - ▁HELP
459
+ - ▁TODAY
460
+ - ▁BRING
461
+ - ▁V
462
+ - ▁END
463
+ - MA
464
+ - ▁CH
465
+ - ▁MOST
466
+ - ▁K
467
+ - ▁AHEAD
468
+ - ▁HER
469
+ - OL
470
+ - ▁SA
471
+ - AM
472
+ - IES
473
+ - ▁THINK
474
+ - ▁NAME
475
+ - ▁TRY
476
+ - ▁MOVE
477
+ - ONE
478
+ - ▁LE
479
+ - ▁TOO
480
+ - TO
481
+ - UM
482
+ - ▁PLACE
483
+ - ▁COULD
484
+ - ▁FIND
485
+ - ▁FIVE
486
+ - ▁ALWAYS
487
+ - ID
488
+ - TY
489
+ - NT
490
+ - ▁FEEL
491
+ - ▁HEAD
492
+ - ▁THAN
493
+ - NA
494
+ - ▁EX
495
+ - ▁EYE
496
+ - ITY
497
+ - CI
498
+ - OP
499
+ - ▁SHOULD
500
+ - ▁MIGHT
501
+ - ▁HOLD
502
+ - ▁CAR
503
+ - AND
504
+ - ▁GREAT
505
+ - ▁RI
506
+ - ▁BU
507
+ - ▁HIGH
508
+ - ▁OPEN
509
+ - ▁BEFORE
510
+ - US
511
+ - ▁FRONT
512
+ - ▁LONG
513
+ - ▁TOGETHER
514
+ - NI
515
+ - ▁HAIR
516
+ - ▁LIGHT
517
+ - ▁TEN
518
+ - ▁HIT
519
+ - EST
520
+ - OUS
521
+ - ▁PRETTY
522
+ - ▁TYPE
523
+ - IP
524
+ - CO
525
+ - ▁FINGER
526
+ - ▁JO
527
+ - ▁UN
528
+ - ▁PRO
529
+ - ▁STRAIGHT
530
+ - ▁BEHALF
531
+ - ▁TI
532
+ - ▁SIX
533
+ - ▁CLEAN
534
+ - ▁DIS
535
+ - ▁DA
536
+ - ▁POSITION
537
+ - IGHT
538
+ - ACT
539
+ - ▁CHA
540
+ - ▁PE
541
+ - GG
542
+ - AP
543
+ - ▁MEAN
544
+ - ▁COMP
545
+ - FI
546
+ - ▁KNEE
547
+ - ▁CALLED
548
+ - ▁HANDS
549
+ - ▁PRE
550
+ - ▁FORWARD
551
+ - ▁AREA
552
+ - ANT
553
+ - ▁TE
554
+ - ▁WA
555
+ - ▁AFTER
556
+ - ▁SMALL
557
+ - ▁THROW
558
+ - ▁EVERY
559
+ - ▁SHOULDER
560
+ - NC
561
+ - PER
562
+ - ▁MAYBE
563
+ - ▁ABLE
564
+ - ▁BASICALLY
565
+ - ▁AM
566
+ - ▁READY
567
+ - ▁BOTTOM
568
+ - IE
569
+ - ▁HALF
570
+ - FF
571
+ - ▁BIG
572
+ - ▁EACH
573
+ - ▁PUSH
574
+ - ▁EIGHT
575
+ - ▁NEW
576
+ - ▁DONE
577
+ - ▁MAY
578
+ - ▁GETTING
579
+ - HO
580
+ - ▁HIS
581
+ - ▁HARD
582
+ - ▁CLOSE
583
+ - ALLY
584
+ - ▁SECOND
585
+ - ▁FEET
586
+ - ICAL
587
+ - ▁JA
588
+ - ▁PAINT
589
+ - ▁LEARN
590
+ - ▁SOUND
591
+ - HE
592
+ - ▁ROLL
593
+ - ▁ONLY
594
+ - ▁DOESN
595
+ - WA
596
+ - ▁DRAW
597
+ - ▁VI
598
+ - ▁DID
599
+ - ▁SHA
600
+ - ▁CENTER
601
+ - CU
602
+ - ▁CLIP
603
+ - ▁PI
604
+ - ▁CARD
605
+ - ▁INSIDE
606
+ - ▁PERSON
607
+ - ▁STILL
608
+ - ▁MAKING
609
+ - 'NO'
610
+ - ▁EVERYTHING
611
+ - .
612
+ - ▁FUN
613
+ - ARD
614
+ - ▁REMEMBER
615
+ - ▁AWAY
616
+ - ATED
617
+ - COM
618
+ - ▁SEVEN
619
+ - ▁BEEN
620
+ - ▁MANY
621
+ - ABLE
622
+ - ▁DAY
623
+ - ▁SIT
624
+ - IZE
625
+ - ▁REAL
626
+ - ▁HIP
627
+ - ▁BASIC
628
+ - ▁KICK
629
+ - ▁TU
630
+ - ATING
631
+ - ▁STICK
632
+ - ▁FLAT
633
+ - ▁WHO
634
+ - END
635
+ - HA
636
+ - ▁EXP
637
+ - ▁PICK
638
+ - ▁MIX
639
+ - ▁TRI
640
+ - ▁BI
641
+ - ▁WHOLE
642
+ - ▁STRETCH
643
+ - ▁BOTH
644
+ - ▁PROBABLY
645
+ - CA
646
+ - ▁HIM
647
+ - ▁STRING
648
+ - ▁EDGE
649
+ - ▁BASE
650
+ - ▁COMING
651
+ - UGH
652
+ - ▁LIFT
653
+ - ▁STA
654
+ - ▁WORKING
655
+ - ▁MU
656
+ - ▁QUICK
657
+ - ▁SOMETIMES
658
+ - ▁HAPPEN
659
+ - ▁YOURSELF
660
+ - ▁TALKING
661
+ - ▁DR
662
+ - ▁TELL
663
+ - ▁ANYTHING
664
+ - ▁BRA
665
+ - ▁LOOKING
666
+ - ▁SLOW
667
+ - ▁NE
668
+ - ▁STAND
669
+ - NER
670
+ - ▁COMES
671
+ - ▁GOES
672
+ - ISE
673
+ - BE
674
+ - ▁USED
675
+ - ▁UNDER
676
+ - ▁BETWEEN
677
+ - ▁HU
678
+ - ▁CREATE
679
+ - ▁NA
680
+ - ▁USUALLY
681
+ - ▁ARM
682
+ - ▁DRY
683
+ - ▁RUN
684
+ - LING
685
+ - ▁BRUSH
686
+ - ▁COVER
687
+ - ▁HEAR
688
+ - ▁DOES
689
+ - ▁STAY
690
+ - ▁EN
691
+ - ▁FOLD
692
+ - ▁CHANGE
693
+ - ▁LAST
694
+ - ▁EASY
695
+ - ▁US
696
+ - ▁PER
697
+ - ▁FACE
698
+ - ▁EAR
699
+ - ▁TIGHT
700
+ - ▁FE
701
+ - ▁PIN
702
+ - ▁MAN
703
+ - ▁BETTER
704
+ - ▁CALL
705
+ - ▁PRI
706
+ - ▁BEST
707
+ - ▁KI
708
+ - ▁COUPLE
709
+ - ▁WHILE
710
+ - ▁SHAPE
711
+ - ▁GAME
712
+ - IV
713
+ - ▁SHOT
714
+ - ▁PAPER
715
+ - ▁OWN
716
+ - ▁ALRIGHT
717
+ - ▁HAD
718
+ - TIC
719
+ - ▁BREATH
720
+ - ▁TOOL
721
+ - '2'
722
+ - ▁ENOUGH
723
+ - ▁COURSE
724
+ - ▁SKIN
725
+ - ▁SPIN
726
+ - ▁VA
727
+ - ▁ARMS
728
+ - ▁TEA
729
+ - ▁BREAK
730
+ - ▁DOG
731
+ - ▁1
732
+ - QUE
733
+ - ▁DROP
734
+ - ▁NUMBER
735
+ - IG
736
+ - ▁RED
737
+ - ▁NOTE
738
+ - ▁WEIGHT
739
+ - WARD
740
+ - ▁PLAYING
741
+ - ▁FINISH
742
+ - ▁MINUTE
743
+ - ▁R
744
+ - ▁PRESS
745
+ - ▁EITHER
746
+ - ▁CHE
747
+ - ▁PU
748
+ - BER
749
+ - ▁FEW
750
+ - ▁SIZE
751
+ - ▁MADE
752
+ - ▁LEAVE
753
+ - ▁GA
754
+ - ▁ALREADY
755
+ - ▁GUY
756
+ - ▁FAR
757
+ - ▁HOME
758
+ - ▁BAR
759
+ - UP
760
+ - ▁GRAB
761
+ - ▁MARK
762
+ - ▁WHITE
763
+ - ▁PROPER
764
+ - ▁CAUSE
765
+ - ▁OK
766
+ - ▁ART
767
+ - HI
768
+ - ▁SORT
769
+ - ▁EXERCISE
770
+ - ▁LOWER
771
+ - PORT
772
+ - ▁PLANT
773
+ - ▁BOARD
774
+ - ▁CASE
775
+ - ▁YEAR
776
+ - CENT
777
+ - ▁DU
778
+ - ▁CHECK
779
+ - ▁WHATEVER
780
+ - ▁OIL
781
+ - ▁IDEA
782
+ - ▁SIMPLE
783
+ - ▁PRACTICE
784
+ - ▁FAST
785
+ - '0'
786
+ - ▁CONTROL
787
+ - ▁J
788
+ - ▁KEY
789
+ - ▁MIDDLE
790
+ - ▁FULL
791
+ - ▁GLASS
792
+ - ▁OUTSIDE
793
+ - ▁LOW
794
+ - ▁REST
795
+ - ▁STUFF
796
+ - ▁ACT
797
+ - ▁UNTIL
798
+ - ▁BLACK
799
+ - ▁POP
800
+ - ▁CLICK
801
+ - ▁HOLE
802
+ - ▁Z
803
+ - ▁COUNT
804
+ - ▁POT
805
+ - ▁ALLOW
806
+ - ▁HAVING
807
+ - ▁TRYING
808
+ - ▁MUSCLE
809
+ - ▁GU
810
+ - ▁BOX
811
+ - ▁NOTICE
812
+ - ▁EXAMPLE
813
+ - UND
814
+ - ▁ALONG
815
+ - FUL
816
+ - ISH
817
+ - ▁STORE
818
+ - ▁LU
819
+ - ▁FLOOR
820
+ - ▁MOVING
821
+ - ▁LARGE
822
+ - ▁STOP
823
+ - ▁PH
824
+ - ▁WALK
825
+ - '5'
826
+ - ▁QU
827
+ - ▁TECHNIQUE
828
+ - ▁SOFT
829
+ - ▁GROUND
830
+ - ▁JUMP
831
+ - ▁JU
832
+ - ▁FILL
833
+ - ▁WHY
834
+ - ▁BUY
835
+ - ▁GREEN
836
+ - ▁WALL
837
+ - ▁HEEL
838
+ - NESS
839
+ - ▁LEVEL
840
+ - ▁UNDERNEATH
841
+ - ▁PATTERN
842
+ - ▁BEHIND
843
+ - ▁OLD
844
+ - ▁TIP
845
+ - ▁COMPLETE
846
+ - ▁WON
847
+ - ▁TEACH
848
+ - ▁FIT
849
+ - ▁NECK
850
+ - ▁REMOVE
851
+ - ▁TRICK
852
+ - ▁MOVEMENT
853
+ - ▁TOWARDS
854
+ - ▁PARTICULAR
855
+ - ▁CHI
856
+ - ▁EFFECT
857
+ - J
858
+ - ▁FREE
859
+ - ▁ACROSS
860
+ - ▁BEND
861
+ - ▁SAFE
862
+ - ▁SLIDE
863
+ - ▁PROBLEM
864
+ - ▁BLOCK
865
+ - ▁PAN
866
+ - ▁NATURAL
867
+ - ▁TOUCH
868
+ - ▁CHILD
869
+ - LINE
870
+ - ▁CROSS
871
+ - ▁REASON
872
+ - '4'
873
+ - ▁POWER
874
+ - ▁APPLY
875
+ - ▁FOLLOW
876
+ - ▁DESIGN
877
+ - ▁SPACE
878
+ - ▁ORDER
879
+ - ▁WOOD
880
+ - ▁RID
881
+ - '3'
882
+ - ▁COOK
883
+ - ▁BEGIN
884
+ - ▁WATCH
885
+ - ▁STYLE
886
+ - QUA
887
+ - ▁PRODUCT
888
+ - ▁TAKING
889
+ - ▁PUTTING
890
+ - ▁EXHALE
891
+ - ▁THOUGH
892
+ - ▁DEEP
893
+ - IAN
894
+ - ▁REACH
895
+ - ▁FOOD
896
+ - ▁ALMOST
897
+ - ▁COOL
898
+ - ▁SECTION
899
+ - ▁SAID
900
+ - ▁ANGLE
901
+ - ▁MUSIC
902
+ - ▁RELAX
903
+ - ▁CORNER
904
+ - ▁DARK
905
+ - ▁CHORD
906
+ - ▁ESPECIALLY
907
+ - ▁SCALE
908
+ - ▁WARM
909
+ - ▁WITHOUT
910
+ - ▁WHEEL
911
+ - ▁SEGMENT
912
+ - ▁TABLE
913
+ - ▁BOOK
914
+ - ▁PASS
915
+ - ▁ELBOW
916
+ - ▁ROUND
917
+ - ▁INHALE
918
+ - ▁SMOOTH
919
+ - ▁ROOM
920
+ - /
921
+ - ▁NINE
922
+ - ▁SHORT
923
+ - ▁MEASURE
924
+ - ▁LESS
925
+ - ▁TWIST
926
+ - ▁BALANCE
927
+ - ▁PROCESS
928
+ - ▁SWITCH
929
+ - ▁GENERAL
930
+ - ▁CLAY
931
+ - ▁CERTAIN
932
+ - ▁NEVER
933
+ - ▁BLUE
934
+ - ▁CUP
935
+ - ▁HOUSE
936
+ - ▁EXTRA
937
+ - ▁MOTION
938
+ - ▁PRESSURE
939
+ - ▁FIRE
940
+ - ▁SIMPLY
941
+ - ▁DOUBLE
942
+ - ▁TWENTY
943
+ - ▁CATCH
944
+ - ▁BECOME
945
+ - ▁BUILD
946
+ - ▁SPEED
947
+ - ▁TRANS
948
+ - ▁DRUM
949
+ - ▁CHEST
950
+ - ▁PICTURE
951
+ - ▁LENGTH
952
+ - ▁CONTINUE
953
+ - ▁COMFORTABLE
954
+ - ▁FISH
955
+ - ▁PHOTO
956
+ - ▁LOOSE
957
+ - ▁SKI
958
+ - ▁LIFE
959
+ - ▁DEGREE
960
+ - ▁OPTION
961
+ - ▁WORD
962
+ - ▁SHARP
963
+ - ▁SHOOT
964
+ - ▁FOUND
965
+ - ▁STRONG
966
+ - ▁QUITE
967
+ - ▁THIRD
968
+ - ▁GLUE
969
+ - ▁MIND
970
+ - ▁DEFINITELY
971
+ - ▁EASIER
972
+ - GRAPH
973
+ - ▁HOOK
974
+ - ▁CLEAR
975
+ - ▁POSE
976
+ - ▁BUTTON
977
+ - ▁CHOOSE
978
+ - ▁THICK
979
+ - ▁SYSTEM
980
+ - ▁PERFECT
981
+ - ▁BEAUTIFUL
982
+ - ▁SPOT
983
+ - ▁GROW
984
+ - ▁SIGN
985
+ - ▁ELSE
986
+ - ▁CONNECT
987
+ - ▁SELECT
988
+ - ▁PUNCH
989
+ - ▁DIRECTION
990
+ - ▁WRAP
991
+ - ▁RELEASE
992
+ - QUI
993
+ - SIDE
994
+ - ▁CAREFUL
995
+ - ▁VIDEO
996
+ - ▁INSTEAD
997
+ - ▁CIRCLE
998
+ - ▁WIRE
999
+ - ▁NOSE
1000
+ - ▁AMOUNT
1001
+ - ▁FOCUS
1002
+ - ▁NORMAL
1003
+ - ▁MAJOR
1004
+ - ▁WHETHER
1005
+ - ▁SURFACE
1006
+ - ▁THUMB
1007
+ - ▁DRIVE
1008
+ - ▁SCREW
1009
+ - ▁POSSIBLE
1010
+ - ▁OBVIOUSLY
1011
+ - ▁COMMON
1012
+ - ▁REGULAR
1013
+ - ▁ADJUST
1014
+ - ▁WIDE
1015
+ - ▁BLADE
1016
+ - ▁FRET
1017
+ - ▁RECOMMEND
1018
+ - ▁BOWL
1019
+ - BOARD
1020
+ - ▁IMAGE
1021
+ - ▁DEPENDING
1022
+ - ▁PROTECT
1023
+ - ▁CLOTH
1024
+ - ▁HEALTH
1025
+ - ▁WRIST
1026
+ - ▁CLUB
1027
+ - ▁DRINK
1028
+ - ▁SINCE
1029
+ - ▁FRIEND
1030
+ - '00'
1031
+ - ▁RUNNING
1032
+ - ▁ITSELF
1033
+ - ▁RECORD
1034
+ - ▁SWING
1035
+ - ▁DIRECT
1036
+ - ▁MATERIAL
1037
+ - ▁YO
1038
+ - ▁LEAST
1039
+ - ▁EXACTLY
1040
+ - ▁BEGINNING
1041
+ - ▁SLIGHTLY
1042
+ - ▁TREAT
1043
+ - ▁CAMERA
1044
+ - ▁QUARTER
1045
+ - ▁WINDOW
1046
+ - '8'
1047
+ - ▁SOMEBODY
1048
+ - ▁BURN
1049
+ - ▁DEMONSTRATE
1050
+ - ▁DIFFERENCE
1051
+ - ▁COMPUTER
1052
+ - IBLE
1053
+ - ▁SHOE
1054
+ - ▁PERFORM
1055
+ - ▁SQUARE
1056
+ - ▁CONSIDER
1057
+ - ▁DRILL
1058
+ - ▁TEXT
1059
+ - ▁FILE
1060
+ - ▁RUB
1061
+ - ▁FABRIC
1062
+ - ▁HUNDRED
1063
+ - ▁GRIP
1064
+ - ▁CHARACTER
1065
+ - ▁SPECIFIC
1066
+ - ▁KNOT
1067
+ - ▁CURL
1068
+ - ▁STITCH
1069
+ - ▁BLEND
1070
+ - ▁FRAME
1071
+ - ▁THIRTY
1072
+ - '1'
1073
+ - ▁HORSE
1074
+ - ▁ATTACH
1075
+ - ▁GROUP
1076
+ - ▁STROKE
1077
+ - ▁GUITAR
1078
+ - ▁APART
1079
+ - ▁MACHINE
1080
+ - ▁CLASS
1081
+ - ▁COMB
1082
+ - ▁ROOT
1083
+ - ▁HELLO
1084
+ - ▁ENERGY
1085
+ - ▁ATTACK
1086
+ - ▁CORRECT
1087
+ - ▁EXTEND
1088
+ - ▁MINOR
1089
+ - ▁PROFESSIONAL
1090
+ - ▁MONEY
1091
+ - ▁STRIP
1092
+ - ▁FLAVOR
1093
+ - ▁EVERYBODY
1094
+ - ▁RULE
1095
+ - ▁DIFFICULT
1096
+ - ▁PROJECT
1097
+ - ▁DISCUSS
1098
+ - ▁FIGURE
1099
+ - ▁HOWEVER
1100
+ - ▁FINAL
1101
+ - ▁STRENGTH
1102
+ - ▁ENTIRE
1103
+ - ▁FIELD
1104
+ - ▁CONTACT
1105
+ - ▁SUPPORT
1106
+ - ▁PALM
1107
+ - ▁SERIES
1108
+ - ▁ENJOY
1109
+ - '6'
1110
+ - ▁WORLD
1111
+ - ▁DECIDE
1112
+ - ▁SPEAK
1113
+ - ▁SEVERAL
1114
+ - ▁WRITE
1115
+ - ▁PROGRAM
1116
+ - ABILITY
1117
+ - ▁KNIFE
1118
+ - ▁PLASTIC
1119
+ - ▁ORGAN
1120
+ - '7'
1121
+ - ▁UNDERSTAND
1122
+ - ▁FIFTEEN
1123
+ - ▁FLEX
1124
+ - ▁INFORMATION
1125
+ - ▁TWELVE
1126
+ - ▁DETAIL
1127
+ - ▁STRIKE
1128
+ - ▁ACTUAL
1129
+ - ▁SPRAY
1130
+ - ▁LOCAL
1131
+ - ▁MOUTH
1132
+ - ▁NIGHT
1133
+ - ▁VEHICLE
1134
+ - ▁OPPOSITE
1135
+ - ▁SCHOOL
1136
+ - '9'
1137
+ - ▁QUESTION
1138
+ - ▁SPECIAL
1139
+ - ▁BIGGER
1140
+ - ▁DEVELOP
1141
+ - ▁PEPPER
1142
+ - ▁PREFER
1143
+ - Q
1144
+ - '%'
1145
+ - ']'
1146
+ - '['
1147
+ - '&'
1148
+ - ','
1149
+ - _
1150
+ - '#'
1151
+ - '='
1152
+ - '@'
1153
+ - +
1154
+ - '*'
1155
+ - $
1156
+ - '~'
1157
+ - <sos/eos>
1158
+ init: null
1159
+ input_size: null
1160
+ ctc_conf:
1161
+ ignore_nan_grad: true
1162
+ model_conf:
1163
+ ctc_weight: 0.0
1164
+ lsm_weight: 0.15
1165
+ length_normalized_loss: false
1166
+ use_preprocessor: true
1167
+ token_type: bpe
1168
+ bpemodel: data/en_token_list/bpe_unigram1000/bpe.model
1169
+ non_linguistic_symbols: data/nlsyms
1170
+ cleaner: null
1171
+ g2p: null
1172
+ speech_volume_normalize: null
1173
+ rir_scp: null
1174
+ rir_apply_prob: 1.0
1175
+ noise_scp: null
1176
+ noise_apply_prob: 1.0
1177
+ noise_db_range: '13_15'
1178
+ frontend: default
1179
+ frontend_conf:
1180
+ n_fft: 512
1181
+ hop_length: 256
1182
+ fs: 16k
1183
+ specaug: specaug
1184
+ specaug_conf:
1185
+ apply_time_warp: true
1186
+ time_warp_window: 5
1187
+ time_warp_mode: bicubic
1188
+ apply_freq_mask: true
1189
+ freq_mask_width_range:
1190
+ - 0
1191
+ - 30
1192
+ num_freq_mask: 2
1193
+ apply_time_mask: true
1194
+ time_mask_width_range:
1195
+ - 0
1196
+ - 40
1197
+ num_time_mask: 2
1198
+ normalize: global_mvn
1199
+ normalize_conf:
1200
+ stats_file: exp/asr_stats_raw_vid_sum/train/feats_stats.npz
1201
+ preencoder: null
1202
+ preencoder_conf: {}
1203
+ encoder: conformer
1204
+ encoder_conf:
1205
+ output_size: 512
1206
+ attention_heads: 8
1207
+ linear_units: 2048
1208
+ num_blocks: 12
1209
+ dropout_rate: 0.1
1210
+ positional_dropout_rate: 0.1
1211
+ attention_dropout_rate: 0.1
1212
+ input_layer: conv2d
1213
+ normalize_before: true
1214
+ macaron_style: true
1215
+ pos_enc_layer_type: abs_pos
1216
+ selfattention_layer_type: lf_selfattn
1217
+ activation_type: swish
1218
+ use_cnn_module: true
1219
+ cnn_module_kernel: 31
1220
+ attention_windows:
1221
+ - 40
1222
+ - 40
1223
+ - 40
1224
+ - 40
1225
+ - 40
1226
+ - 40
1227
+ - 40
1228
+ - 40
1229
+ - 40
1230
+ - 40
1231
+ - 40
1232
+ - 40
1233
+ attention_dilation:
1234
+ - 1
1235
+ - 1
1236
+ - 1
1237
+ - 1
1238
+ - 1
1239
+ - 1
1240
+ - 1
1241
+ - 1
1242
+ - 1
1243
+ - 1
1244
+ - 1
1245
+ - 1
1246
+ attention_mode: tvm
1247
+ decoder: transformer
1248
+ decoder_conf:
1249
+ attention_heads: 4
1250
+ linear_units: 512
1251
+ num_blocks: 6
1252
+ dropout_rate: 0.15
1253
+ positional_dropout_rate: 0.15
1254
+ self_attention_dropout_rate: 0.15
1255
+ src_attention_dropout_rate: 0.15
1256
+ required:
1257
+ - output_dir
1258
+ - token_list
1259
+ version: 0.10.0
1260
+ distributed: true
1261
+ ```
1262
+
1263
+ </details>
1264
+
1265
+
1266
+
1267
+ ### Citing ESPnet
1268
+
1269
+ ```BibTex
1270
+ @inproceedings{watanabe2018espnet,
1271
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1272
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
1273
+ year={2018},
1274
+ booktitle={Proceedings of Interspeech},
1275
+ pages={2207--2211},
1276
+ doi={10.21437/Interspeech.2018-1456},
1277
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
1278
+ }
1279
+
1280
+
1281
+
1282
+
1283
+ ```
1284
+
1285
+ or arXiv:
1286
+
1287
+ ```bibtex
1288
+ @misc{watanabe2018espnet,
1289
+ title={ESPnet: End-to-End Speech Processing Toolkit},
1290
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1291
+ year={2018},
1292
+ eprint={1804.00015},
1293
+ archivePrefix={arXiv},
1294
+ primaryClass={cs.CL}
1295
+ }
1296
+ ```
data/en_token_list/bpe_unigram1000/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80f7c79144bddbe6d5fd80aa8da9797a88cd0c546c40bf4350da35f8993085d3
3
+ size 253157
data/nlsyms ADDED
File without changes
exp/asr_stats_raw_vid_sum/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
 
exp/roshansh_how2_asr_raw_ft_sum/RESULTS.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon Feb 7 15:24:21 EST 2022`
5
+ - python version: `3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.6a1`
7
+ - pytorch version: `pytorch 1.10.1`
8
+ - Git hash: `04561cdf3b6c3bc1d51edb04c93b953759ef551d`
9
+ - Commit date: `Mon Feb 7 09:06:12 2022 -0500`
10
+
11
+ ## asr_raw_ft_sum
12
+ |dataset|ROUGE-1|ROUGE-2|ROUGE-L|METEOR|BERTScore|
13
+ |---|---|---|---|---|---|
exp/roshansh_how2_asr_raw_ft_sum/config.yaml ADDED
@@ -0,0 +1,1214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr_conformer_vid_lf.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_raw_ft_sum
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 8
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 45875
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 10
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ grad_clip: 5.0
42
+ grad_clip_type: 2.0
43
+ grad_noise: false
44
+ accum_grad: 10
45
+ no_forward_run: false
46
+ resume: true
47
+ train_dtype: float32
48
+ use_amp: false
49
+ log_interval: 5000
50
+ use_tensorboard: true
51
+ use_wandb: false
52
+ wandb_project: null
53
+ wandb_id: null
54
+ wandb_entity: null
55
+ wandb_name: null
56
+ wandb_model_log_interval: -1
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param:
60
+ - exp/asr_raw_utt_conformer/valid.acc.ave_10best.pth:::ctc
61
+ ignore_init_mismatch: false
62
+ freeze_param: []
63
+ num_iters_per_epoch: null
64
+ batch_size: 20
65
+ valid_batch_size: null
66
+ batch_bins: 60000000
67
+ valid_batch_bins: null
68
+ train_shape_file:
69
+ - exp/asr_stats_raw_vid_sum/train/speech_shape
70
+ - exp/asr_stats_raw_vid_sum/train/text_shape.bpe
71
+ valid_shape_file:
72
+ - exp/asr_stats_raw_vid_sum/valid/speech_shape
73
+ - exp/asr_stats_raw_vid_sum/valid/text_shape.bpe
74
+ batch_type: length
75
+ valid_batch_type: null
76
+ fold_length:
77
+ - 80000
78
+ - 150
79
+ sort_in_batch: descending
80
+ sort_batch: descending
81
+ multiple_iterator: false
82
+ chunk_length: 500
83
+ chunk_shift_ratio: 0.5
84
+ num_cache_chunks: 1024
85
+ train_data_path_and_name_and_type:
86
+ - - dump/raw/tr_2000h_sum_trim/wav.scp
87
+ - speech
88
+ - sound
89
+ - - dump/raw/tr_2000h_sum_trim/text
90
+ - text
91
+ - text
92
+ valid_data_path_and_name_and_type:
93
+ - - dump/raw/cv05_sum_trim/wav.scp
94
+ - speech
95
+ - sound
96
+ - - dump/raw/cv05_sum_trim/text
97
+ - text
98
+ - text
99
+ allow_variable_data_keys: false
100
+ max_cache_size: 0.0
101
+ max_cache_fd: 32
102
+ valid_max_cache_size: null
103
+ optim: adam
104
+ optim_conf:
105
+ lr: 0.001
106
+ scheduler: reducelronplateau
107
+ scheduler_conf:
108
+ mode: min
109
+ factor: 0.5
110
+ patience: 1
111
+ token_list:
112
+ - <blank>
113
+ - <unk>
114
+ - '[hes]'
115
+ - S
116
+ - ▁THE
117
+ - ▁TO
118
+ - ''''
119
+ - ▁AND
120
+ - ▁YOU
121
+ - ▁A
122
+ - ▁IT
123
+ - T
124
+ - ▁THAT
125
+ - ▁OF
126
+ - ▁I
127
+ - ▁IS
128
+ - RE
129
+ - ▁IN
130
+ - ING
131
+ - ▁WE
132
+ - M
133
+ - ▁GOING
134
+ - ▁SO
135
+ - ▁THIS
136
+ - ▁YOUR
137
+ - ▁ON
138
+ - E
139
+ - D
140
+ - ▁BE
141
+ - ▁CAN
142
+ - N
143
+ - Y
144
+ - O
145
+ - ER
146
+ - ▁HAVE
147
+ - ▁JUST
148
+ - ▁FOR
149
+ - ▁WITH
150
+ - ▁DO
151
+ - ED
152
+ - ▁ARE
153
+ - ▁WANT
154
+ - ▁UP
155
+ - R
156
+ - LL
157
+ - P
158
+ - ▁
159
+ - L
160
+ - B
161
+ - ▁IF
162
+ - C
163
+ - ▁ONE
164
+ - ▁S
165
+ - ▁OR
166
+ - A
167
+ - ▁GO
168
+ - ▁LIKE
169
+ - ▁NOW
170
+ - ▁HERE
171
+ - VE
172
+ - LE
173
+ - U
174
+ - ▁GET
175
+ - ▁WHAT
176
+ - ▁OUT
177
+ - IN
178
+ - W
179
+ - ▁C
180
+ - ▁LITTLE
181
+ - ▁THERE
182
+ - LY
183
+ - ▁AS
184
+ - ▁MAKE
185
+ - I
186
+ - ▁THEY
187
+ - ▁MY
188
+ - K
189
+ - ▁THEN
190
+ - ▁BUT
191
+ - AL
192
+ - G
193
+ - ▁ALL
194
+ - OR
195
+ - ▁BACK
196
+ - ▁NOT
197
+ - ▁ABOUT
198
+ - ▁RIGHT
199
+ - ▁OUR
200
+ - EN
201
+ - ▁SOME
202
+ - ▁DOWN
203
+ - F
204
+ - ▁WHEN
205
+ - CH
206
+ - ▁F
207
+ - ▁HOW
208
+ - AR
209
+ - ▁WILL
210
+ - ▁RE
211
+ - CK
212
+ - ▁G
213
+ - ES
214
+ - CE
215
+ - ▁TAKE
216
+ - ▁AT
217
+ - ▁FROM
218
+ - ▁WAY
219
+ - TER
220
+ - ▁SEE
221
+ - RA
222
+ - ▁USE
223
+ - ▁REALLY
224
+ - RI
225
+ - TH
226
+ - ▁TWO
227
+ - ▁ME
228
+ - ▁VERY
229
+ - ▁E
230
+ - ▁B
231
+ - AT
232
+ - ▁THEM
233
+ - ▁DON
234
+ - ▁AN
235
+ - ▁BECAUSE
236
+ - ▁MORE
237
+ - RO
238
+ - H
239
+ - 'ON'
240
+ - LI
241
+ - ▁PUT
242
+ - ▁ST
243
+ - IL
244
+ - ▁BIT
245
+ - ▁START
246
+ - ▁NEED
247
+ - ▁INTO
248
+ - UR
249
+ - ▁TIME
250
+ - ▁OVER
251
+ - ▁W
252
+ - ▁DE
253
+ - ▁LOOK
254
+ - ▁THESE
255
+ - ▁LET
256
+ - ▁GOOD
257
+ - ▁ALSO
258
+ - AN
259
+ - ▁OFF
260
+ - ▁HE
261
+ - ▁KIND
262
+ - ▁SIDE
263
+ - ▁CO
264
+ - ▁SURE
265
+ - ▁AGAIN
266
+ - ▁MA
267
+ - ▁KNOW
268
+ - IT
269
+ - ▁WOULD
270
+ - IC
271
+ - ▁OTHER
272
+ - LA
273
+ - ▁P
274
+ - ▁WHICH
275
+ - '-'
276
+ - IR
277
+ - ▁LA
278
+ - ▁HAND
279
+ - EL
280
+ - ▁LOT
281
+ - ▁WHERE
282
+ - ▁THREE
283
+ - ▁PA
284
+ - ION
285
+ - LO
286
+ - ▁KEEP
287
+ - ▁SHOW
288
+ - ▁THING
289
+ - ▁FIRST
290
+ - TE
291
+ - ENT
292
+ - ATE
293
+ - ▁COME
294
+ - AD
295
+ - ▁GOT
296
+ - NG
297
+ - ▁NICE
298
+ - ▁T
299
+ - ET
300
+ - ▁MO
301
+ - ▁ANY
302
+ - ▁ACTUALLY
303
+ - ▁DIFFERENT
304
+ - ▁SE
305
+ - GE
306
+ - ▁WORK
307
+ - ▁THROUGH
308
+ - ▁O
309
+ - KE
310
+ - V
311
+ - ▁AROUND
312
+ - ▁BA
313
+ - PE
314
+ - ▁HI
315
+ - ▁BY
316
+ - SH
317
+ - ATION
318
+ - ▁SU
319
+ - ▁CA
320
+ - ▁D
321
+ - ▁LO
322
+ - ▁HAS
323
+ - ▁LI
324
+ - ▁PLAY
325
+ - Z
326
+ - ▁ADD
327
+ - ▁RO
328
+ - ▁TA
329
+ - AS
330
+ - ▁FOUR
331
+ - ▁CON
332
+ - ▁THOSE
333
+ - MP
334
+ - NE
335
+ - ▁SP
336
+ - UT
337
+ - ▁GIVE
338
+ - ▁WELL
339
+ - ▁BALL
340
+ - TING
341
+ - RY
342
+ - X
343
+ - ▁HO
344
+ - INE
345
+ - IVE
346
+ - ▁NEXT
347
+ - ▁PO
348
+ - ▁STEP
349
+ - ▁EVEN
350
+ - TION
351
+ - ▁MI
352
+ - MENT
353
+ - ▁CUT
354
+ - ▁BO
355
+ - ▁LINE
356
+ - ▁MUCH
357
+ - ▁THINGS
358
+ - ▁TALK
359
+ - UN
360
+ - ▁PART
361
+ - ▁WAS
362
+ - ▁FA
363
+ - ▁SOMETHING
364
+ - PP
365
+ - ANCE
366
+ - ND
367
+ - DI
368
+ - ▁RA
369
+ - AGE
370
+ - ▁SAME
371
+ - ▁EXPERT
372
+ - ▁DOING
373
+ - ▁LEFT
374
+ - IST
375
+ - ▁DI
376
+ - ▁NO
377
+ - RU
378
+ - ME
379
+ - TA
380
+ - UL
381
+ - TI
382
+ - ▁VILLAGE
383
+ - DE
384
+ - ERS
385
+ - ▁PEOPLE
386
+ - ▁TURN
387
+ - VER
388
+ - ▁FL
389
+ - ▁LEG
390
+ - ▁ONCE
391
+ - ▁COLOR
392
+ - ▁PULL
393
+ - ▁USING
394
+ - VI
395
+ - ▁WATER
396
+ - ▁SHE
397
+ - ▁TOP
398
+ - ▁OKAY
399
+ - ▁ANOTHER
400
+ - ▁THEIR
401
+ - ▁SAY
402
+ - URE
403
+ - ▁HA
404
+ - ▁IMPORTANT
405
+ - ▁PIECE
406
+ - ▁FOOT
407
+ - ▁TRA
408
+ - ▁SC
409
+ - ▁BODY
410
+ - ▁SET
411
+ - ▁POINT
412
+ - ▁HELP
413
+ - ▁TODAY
414
+ - ▁BRING
415
+ - ▁V
416
+ - ▁END
417
+ - MA
418
+ - ▁CH
419
+ - ▁MOST
420
+ - ▁K
421
+ - ▁AHEAD
422
+ - ▁HER
423
+ - OL
424
+ - ▁SA
425
+ - AM
426
+ - IES
427
+ - ▁THINK
428
+ - ▁NAME
429
+ - ▁TRY
430
+ - ▁MOVE
431
+ - ONE
432
+ - ▁LE
433
+ - ▁TOO
434
+ - TO
435
+ - UM
436
+ - ▁PLACE
437
+ - ▁COULD
438
+ - ▁FIND
439
+ - ▁FIVE
440
+ - ▁ALWAYS
441
+ - ID
442
+ - TY
443
+ - NT
444
+ - ▁FEEL
445
+ - ▁HEAD
446
+ - ▁THAN
447
+ - NA
448
+ - ▁EX
449
+ - ▁EYE
450
+ - ITY
451
+ - CI
452
+ - OP
453
+ - ▁SHOULD
454
+ - ▁MIGHT
455
+ - ▁HOLD
456
+ - ▁CAR
457
+ - AND
458
+ - ▁GREAT
459
+ - ▁RI
460
+ - ▁BU
461
+ - ▁HIGH
462
+ - ▁OPEN
463
+ - ▁BEFORE
464
+ - US
465
+ - ▁FRONT
466
+ - ▁LONG
467
+ - ▁TOGETHER
468
+ - NI
469
+ - ▁HAIR
470
+ - ▁LIGHT
471
+ - ▁TEN
472
+ - ▁HIT
473
+ - EST
474
+ - OUS
475
+ - ▁PRETTY
476
+ - ▁TYPE
477
+ - IP
478
+ - CO
479
+ - ▁FINGER
480
+ - ▁JO
481
+ - ▁UN
482
+ - ▁PRO
483
+ - ▁STRAIGHT
484
+ - ▁BEHALF
485
+ - ▁TI
486
+ - ▁SIX
487
+ - ▁CLEAN
488
+ - ▁DIS
489
+ - ▁DA
490
+ - ▁POSITION
491
+ - IGHT
492
+ - ACT
493
+ - ▁CHA
494
+ - ▁PE
495
+ - GG
496
+ - AP
497
+ - ▁MEAN
498
+ - ▁COMP
499
+ - FI
500
+ - ▁KNEE
501
+ - ▁CALLED
502
+ - ▁HANDS
503
+ - ▁PRE
504
+ - ▁FORWARD
505
+ - ▁AREA
506
+ - ANT
507
+ - ▁TE
508
+ - ▁WA
509
+ - ▁AFTER
510
+ - ▁SMALL
511
+ - ▁THROW
512
+ - ▁EVERY
513
+ - ▁SHOULDER
514
+ - NC
515
+ - PER
516
+ - ▁MAYBE
517
+ - ▁ABLE
518
+ - ▁BASICALLY
519
+ - ▁AM
520
+ - ▁READY
521
+ - ▁BOTTOM
522
+ - IE
523
+ - ▁HALF
524
+ - FF
525
+ - ▁BIG
526
+ - ▁EACH
527
+ - ▁PUSH
528
+ - ▁EIGHT
529
+ - ▁NEW
530
+ - ▁DONE
531
+ - ▁MAY
532
+ - ▁GETTING
533
+ - HO
534
+ - ▁HIS
535
+ - ▁HARD
536
+ - ▁CLOSE
537
+ - ALLY
538
+ - ▁SECOND
539
+ - ▁FEET
540
+ - ICAL
541
+ - ▁JA
542
+ - ▁PAINT
543
+ - ▁LEARN
544
+ - ▁SOUND
545
+ - HE
546
+ - ▁ROLL
547
+ - ▁ONLY
548
+ - ▁DOESN
549
+ - WA
550
+ - ▁DRAW
551
+ - ▁VI
552
+ - ▁DID
553
+ - ▁SHA
554
+ - ▁CENTER
555
+ - CU
556
+ - ▁CLIP
557
+ - ▁PI
558
+ - ▁CARD
559
+ - ▁INSIDE
560
+ - ▁PERSON
561
+ - ▁STILL
562
+ - ▁MAKING
563
+ - 'NO'
564
+ - ▁EVERYTHING
565
+ - .
566
+ - ▁FUN
567
+ - ARD
568
+ - ▁REMEMBER
569
+ - ▁AWAY
570
+ - ATED
571
+ - COM
572
+ - ▁SEVEN
573
+ - ▁BEEN
574
+ - ▁MANY
575
+ - ABLE
576
+ - ▁DAY
577
+ - ▁SIT
578
+ - IZE
579
+ - ▁REAL
580
+ - ▁HIP
581
+ - ▁BASIC
582
+ - ▁KICK
583
+ - ▁TU
584
+ - ATING
585
+ - ▁STICK
586
+ - ▁FLAT
587
+ - ▁WHO
588
+ - END
589
+ - HA
590
+ - ▁EXP
591
+ - ▁PICK
592
+ - ▁MIX
593
+ - ▁TRI
594
+ - ▁BI
595
+ - ▁WHOLE
596
+ - ▁STRETCH
597
+ - ▁BOTH
598
+ - ▁PROBABLY
599
+ - CA
600
+ - ▁HIM
601
+ - ▁STRING
602
+ - ▁EDGE
603
+ - ▁BASE
604
+ - ▁COMING
605
+ - UGH
606
+ - ▁LIFT
607
+ - ▁STA
608
+ - ▁WORKING
609
+ - ▁MU
610
+ - ▁QUICK
611
+ - ▁SOMETIMES
612
+ - ▁HAPPEN
613
+ - ▁YOURSELF
614
+ - ▁TALKING
615
+ - ▁DR
616
+ - ▁TELL
617
+ - ▁ANYTHING
618
+ - ▁BRA
619
+ - ▁LOOKING
620
+ - ▁SLOW
621
+ - ▁NE
622
+ - ▁STAND
623
+ - NER
624
+ - ▁COMES
625
+ - ▁GOES
626
+ - ISE
627
+ - BE
628
+ - ▁USED
629
+ - ▁UNDER
630
+ - ▁BETWEEN
631
+ - ▁HU
632
+ - ▁CREATE
633
+ - ▁NA
634
+ - ▁USUALLY
635
+ - ▁ARM
636
+ - ▁DRY
637
+ - ▁RUN
638
+ - LING
639
+ - ▁BRUSH
640
+ - ▁COVER
641
+ - ▁HEAR
642
+ - ▁DOES
643
+ - ▁STAY
644
+ - ▁EN
645
+ - ▁FOLD
646
+ - ▁CHANGE
647
+ - ▁LAST
648
+ - ▁EASY
649
+ - ▁US
650
+ - ▁PER
651
+ - ▁FACE
652
+ - ▁EAR
653
+ - ▁TIGHT
654
+ - ▁FE
655
+ - ▁PIN
656
+ - ▁MAN
657
+ - ▁BETTER
658
+ - ▁CALL
659
+ - ▁PRI
660
+ - ▁BEST
661
+ - ▁KI
662
+ - ▁COUPLE
663
+ - ▁WHILE
664
+ - ▁SHAPE
665
+ - ▁GAME
666
+ - IV
667
+ - ▁SHOT
668
+ - ▁PAPER
669
+ - ▁OWN
670
+ - ▁ALRIGHT
671
+ - ▁HAD
672
+ - TIC
673
+ - ▁BREATH
674
+ - ▁TOOL
675
+ - '2'
676
+ - ▁ENOUGH
677
+ - ▁COURSE
678
+ - ▁SKIN
679
+ - ▁SPIN
680
+ - ▁VA
681
+ - ▁ARMS
682
+ - ▁TEA
683
+ - ▁BREAK
684
+ - ▁DOG
685
+ - ▁1
686
+ - QUE
687
+ - ▁DROP
688
+ - ▁NUMBER
689
+ - IG
690
+ - ▁RED
691
+ - ▁NOTE
692
+ - ▁WEIGHT
693
+ - WARD
694
+ - ▁PLAYING
695
+ - ▁FINISH
696
+ - ▁MINUTE
697
+ - ▁R
698
+ - ▁PRESS
699
+ - ▁EITHER
700
+ - ▁CHE
701
+ - ▁PU
702
+ - BER
703
+ - ▁FEW
704
+ - ▁SIZE
705
+ - ▁MADE
706
+ - ▁LEAVE
707
+ - ▁GA
708
+ - ▁ALREADY
709
+ - ▁GUY
710
+ - ▁FAR
711
+ - ▁HOME
712
+ - ▁BAR
713
+ - UP
714
+ - ▁GRAB
715
+ - ▁MARK
716
+ - ▁WHITE
717
+ - ▁PROPER
718
+ - ▁CAUSE
719
+ - ▁OK
720
+ - ▁ART
721
+ - HI
722
+ - ▁SORT
723
+ - ▁EXERCISE
724
+ - ▁LOWER
725
+ - PORT
726
+ - ▁PLANT
727
+ - ▁BOARD
728
+ - ▁CASE
729
+ - ▁YEAR
730
+ - CENT
731
+ - ▁DU
732
+ - ▁CHECK
733
+ - ▁WHATEVER
734
+ - ▁OIL
735
+ - ▁IDEA
736
+ - ▁SIMPLE
737
+ - ▁PRACTICE
738
+ - ▁FAST
739
+ - '0'
740
+ - ▁CONTROL
741
+ - ▁J
742
+ - ▁KEY
743
+ - ▁MIDDLE
744
+ - ▁FULL
745
+ - ▁GLASS
746
+ - ▁OUTSIDE
747
+ - ▁LOW
748
+ - ▁REST
749
+ - ▁STUFF
750
+ - ▁ACT
751
+ - ▁UNTIL
752
+ - ▁BLACK
753
+ - ▁POP
754
+ - ▁CLICK
755
+ - ▁HOLE
756
+ - ▁Z
757
+ - ▁COUNT
758
+ - ▁POT
759
+ - ▁ALLOW
760
+ - ▁HAVING
761
+ - ▁TRYING
762
+ - ▁MUSCLE
763
+ - ▁GU
764
+ - ▁BOX
765
+ - ▁NOTICE
766
+ - ▁EXAMPLE
767
+ - UND
768
+ - ▁ALONG
769
+ - FUL
770
+ - ISH
771
+ - ▁STORE
772
+ - ▁LU
773
+ - ▁FLOOR
774
+ - ▁MOVING
775
+ - ▁LARGE
776
+ - ▁STOP
777
+ - ▁PH
778
+ - ▁WALK
779
+ - '5'
780
+ - ▁QU
781
+ - ▁TECHNIQUE
782
+ - ▁SOFT
783
+ - ▁GROUND
784
+ - ▁JUMP
785
+ - ▁JU
786
+ - ▁FILL
787
+ - ▁WHY
788
+ - ▁BUY
789
+ - ▁GREEN
790
+ - ▁WALL
791
+ - ▁HEEL
792
+ - NESS
793
+ - ▁LEVEL
794
+ - ▁UNDERNEATH
795
+ - ▁PATTERN
796
+ - ▁BEHIND
797
+ - ▁OLD
798
+ - ▁TIP
799
+ - ▁COMPLETE
800
+ - ▁WON
801
+ - ▁TEACH
802
+ - ▁FIT
803
+ - ▁NECK
804
+ - ▁REMOVE
805
+ - ▁TRICK
806
+ - ▁MOVEMENT
807
+ - ▁TOWARDS
808
+ - ▁PARTICULAR
809
+ - ▁CHI
810
+ - ▁EFFECT
811
+ - J
812
+ - ▁FREE
813
+ - ▁ACROSS
814
+ - ▁BEND
815
+ - ▁SAFE
816
+ - ▁SLIDE
817
+ - ▁PROBLEM
818
+ - ▁BLOCK
819
+ - ▁PAN
820
+ - ▁NATURAL
821
+ - ▁TOUCH
822
+ - ▁CHILD
823
+ - LINE
824
+ - ▁CROSS
825
+ - ▁REASON
826
+ - '4'
827
+ - ▁POWER
828
+ - ▁APPLY
829
+ - ▁FOLLOW
830
+ - ▁DESIGN
831
+ - ▁SPACE
832
+ - ▁ORDER
833
+ - ▁WOOD
834
+ - ▁RID
835
+ - '3'
836
+ - ▁COOK
837
+ - ▁BEGIN
838
+ - ▁WATCH
839
+ - ▁STYLE
840
+ - QUA
841
+ - ▁PRODUCT
842
+ - ▁TAKING
843
+ - ▁PUTTING
844
+ - ▁EXHALE
845
+ - ▁THOUGH
846
+ - ▁DEEP
847
+ - IAN
848
+ - ▁REACH
849
+ - ▁FOOD
850
+ - ▁ALMOST
851
+ - ▁COOL
852
+ - ▁SECTION
853
+ - ▁SAID
854
+ - ▁ANGLE
855
+ - ▁MUSIC
856
+ - ▁RELAX
857
+ - ▁CORNER
858
+ - ▁DARK
859
+ - ▁CHORD
860
+ - ▁ESPECIALLY
861
+ - ▁SCALE
862
+ - ▁WARM
863
+ - ▁WITHOUT
864
+ - ▁WHEEL
865
+ - ▁SEGMENT
866
+ - ▁TABLE
867
+ - ▁BOOK
868
+ - ▁PASS
869
+ - ▁ELBOW
870
+ - ▁ROUND
871
+ - ▁INHALE
872
+ - ▁SMOOTH
873
+ - ▁ROOM
874
+ - /
875
+ - ▁NINE
876
+ - ▁SHORT
877
+ - ▁MEASURE
878
+ - ▁LESS
879
+ - ▁TWIST
880
+ - ▁BALANCE
881
+ - ▁PROCESS
882
+ - ▁SWITCH
883
+ - ▁GENERAL
884
+ - ▁CLAY
885
+ - ▁CERTAIN
886
+ - ▁NEVER
887
+ - ▁BLUE
888
+ - ▁CUP
889
+ - ▁HOUSE
890
+ - ▁EXTRA
891
+ - ▁MOTION
892
+ - ▁PRESSURE
893
+ - ▁FIRE
894
+ - ▁SIMPLY
895
+ - ▁DOUBLE
896
+ - ▁TWENTY
897
+ - ▁CATCH
898
+ - ▁BECOME
899
+ - ▁BUILD
900
+ - ▁SPEED
901
+ - ▁TRANS
902
+ - ▁DRUM
903
+ - ▁CHEST
904
+ - ▁PICTURE
905
+ - ▁LENGTH
906
+ - ▁CONTINUE
907
+ - ▁COMFORTABLE
908
+ - ▁FISH
909
+ - ▁PHOTO
910
+ - ▁LOOSE
911
+ - ▁SKI
912
+ - ▁LIFE
913
+ - ▁DEGREE
914
+ - ▁OPTION
915
+ - ▁WORD
916
+ - ▁SHARP
917
+ - ▁SHOOT
918
+ - ▁FOUND
919
+ - ▁STRONG
920
+ - ▁QUITE
921
+ - ▁THIRD
922
+ - ▁GLUE
923
+ - ▁MIND
924
+ - ▁DEFINITELY
925
+ - ▁EASIER
926
+ - GRAPH
927
+ - ▁HOOK
928
+ - ▁CLEAR
929
+ - ▁POSE
930
+ - ▁BUTTON
931
+ - ▁CHOOSE
932
+ - ▁THICK
933
+ - ▁SYSTEM
934
+ - ▁PERFECT
935
+ - ▁BEAUTIFUL
936
+ - ▁SPOT
937
+ - ▁GROW
938
+ - ▁SIGN
939
+ - ▁ELSE
940
+ - ▁CONNECT
941
+ - ▁SELECT
942
+ - ▁PUNCH
943
+ - ▁DIRECTION
944
+ - ▁WRAP
945
+ - ▁RELEASE
946
+ - QUI
947
+ - SIDE
948
+ - ▁CAREFUL
949
+ - ▁VIDEO
950
+ - ▁INSTEAD
951
+ - ▁CIRCLE
952
+ - ▁WIRE
953
+ - ▁NOSE
954
+ - ▁AMOUNT
955
+ - ▁FOCUS
956
+ - ▁NORMAL
957
+ - ▁MAJOR
958
+ - ▁WHETHER
959
+ - ▁SURFACE
960
+ - ▁THUMB
961
+ - ▁DRIVE
962
+ - ▁SCREW
963
+ - ▁POSSIBLE
964
+ - ▁OBVIOUSLY
965
+ - ▁COMMON
966
+ - ▁REGULAR
967
+ - ▁ADJUST
968
+ - ▁WIDE
969
+ - ▁BLADE
970
+ - ▁FRET
971
+ - ▁RECOMMEND
972
+ - ▁BOWL
973
+ - BOARD
974
+ - ▁IMAGE
975
+ - ▁DEPENDING
976
+ - ▁PROTECT
977
+ - ▁CLOTH
978
+ - ▁HEALTH
979
+ - ▁WRIST
980
+ - ▁CLUB
981
+ - ▁DRINK
982
+ - ▁SINCE
983
+ - ▁FRIEND
984
+ - '00'
985
+ - ▁RUNNING
986
+ - ▁ITSELF
987
+ - ▁RECORD
988
+ - ▁SWING
989
+ - ▁DIRECT
990
+ - ▁MATERIAL
991
+ - ▁YO
992
+ - ▁LEAST
993
+ - ▁EXACTLY
994
+ - ▁BEGINNING
995
+ - ▁SLIGHTLY
996
+ - ▁TREAT
997
+ - ▁CAMERA
998
+ - ▁QUARTER
999
+ - ▁WINDOW
1000
+ - '8'
1001
+ - ▁SOMEBODY
1002
+ - ▁BURN
1003
+ - ▁DEMONSTRATE
1004
+ - ▁DIFFERENCE
1005
+ - ▁COMPUTER
1006
+ - IBLE
1007
+ - ▁SHOE
1008
+ - ▁PERFORM
1009
+ - ▁SQUARE
1010
+ - ▁CONSIDER
1011
+ - ▁DRILL
1012
+ - ▁TEXT
1013
+ - ▁FILE
1014
+ - ▁RUB
1015
+ - ▁FABRIC
1016
+ - ▁HUNDRED
1017
+ - ▁GRIP
1018
+ - ▁CHARACTER
1019
+ - ▁SPECIFIC
1020
+ - ▁KNOT
1021
+ - ▁CURL
1022
+ - ▁STITCH
1023
+ - ▁BLEND
1024
+ - ▁FRAME
1025
+ - ▁THIRTY
1026
+ - '1'
1027
+ - ▁HORSE
1028
+ - ▁ATTACH
1029
+ - ▁GROUP
1030
+ - ▁STROKE
1031
+ - ▁GUITAR
1032
+ - ▁APART
1033
+ - ▁MACHINE
1034
+ - ▁CLASS
1035
+ - ▁COMB
1036
+ - ▁ROOT
1037
+ - ▁HELLO
1038
+ - ▁ENERGY
1039
+ - ▁ATTACK
1040
+ - ▁CORRECT
1041
+ - ▁EXTEND
1042
+ - ▁MINOR
1043
+ - ▁PROFESSIONAL
1044
+ - ▁MONEY
1045
+ - ▁STRIP
1046
+ - ▁FLAVOR
1047
+ - ▁EVERYBODY
1048
+ - ▁RULE
1049
+ - ▁DIFFICULT
1050
+ - ▁PROJECT
1051
+ - ▁DISCUSS
1052
+ - ▁FIGURE
1053
+ - ▁HOWEVER
1054
+ - ▁FINAL
1055
+ - ▁STRENGTH
1056
+ - ▁ENTIRE
1057
+ - ▁FIELD
1058
+ - ▁CONTACT
1059
+ - ▁SUPPORT
1060
+ - ▁PALM
1061
+ - ▁SERIES
1062
+ - ▁ENJOY
1063
+ - '6'
1064
+ - ▁WORLD
1065
+ - ▁DECIDE
1066
+ - ▁SPEAK
1067
+ - ▁SEVERAL
1068
+ - ▁WRITE
1069
+ - ▁PROGRAM
1070
+ - ABILITY
1071
+ - ▁KNIFE
1072
+ - ▁PLASTIC
1073
+ - ▁ORGAN
1074
+ - '7'
1075
+ - ▁UNDERSTAND
1076
+ - ▁FIFTEEN
1077
+ - ▁FLEX
1078
+ - ▁INFORMATION
1079
+ - ▁TWELVE
1080
+ - ▁DETAIL
1081
+ - ▁STRIKE
1082
+ - ▁ACTUAL
1083
+ - ▁SPRAY
1084
+ - ▁LOCAL
1085
+ - ▁MOUTH
1086
+ - ▁NIGHT
1087
+ - ▁VEHICLE
1088
+ - ▁OPPOSITE
1089
+ - ▁SCHOOL
1090
+ - '9'
1091
+ - ▁QUESTION
1092
+ - ▁SPECIAL
1093
+ - ▁BIGGER
1094
+ - ▁DEVELOP
1095
+ - ▁PEPPER
1096
+ - ▁PREFER
1097
+ - Q
1098
+ - '%'
1099
+ - ']'
1100
+ - '['
1101
+ - '&'
1102
+ - ','
1103
+ - _
1104
+ - '#'
1105
+ - '='
1106
+ - '@'
1107
+ - +
1108
+ - '*'
1109
+ - $
1110
+ - '~'
1111
+ - <sos/eos>
1112
+ init: null
1113
+ input_size: null
1114
+ ctc_conf:
1115
+ ignore_nan_grad: true
1116
+ model_conf:
1117
+ ctc_weight: 0.0
1118
+ lsm_weight: 0.15
1119
+ length_normalized_loss: false
1120
+ use_preprocessor: true
1121
+ token_type: bpe
1122
+ bpemodel: data/en_token_list/bpe_unigram1000/bpe.model
1123
+ non_linguistic_symbols: data/nlsyms
1124
+ cleaner: null
1125
+ g2p: null
1126
+ speech_volume_normalize: null
1127
+ rir_scp: null
1128
+ rir_apply_prob: 1.0
1129
+ noise_scp: null
1130
+ noise_apply_prob: 1.0
1131
+ noise_db_range: '13_15'
1132
+ frontend: default
1133
+ frontend_conf:
1134
+ n_fft: 512
1135
+ hop_length: 256
1136
+ fs: 16k
1137
+ specaug: specaug
1138
+ specaug_conf:
1139
+ apply_time_warp: true
1140
+ time_warp_window: 5
1141
+ time_warp_mode: bicubic
1142
+ apply_freq_mask: true
1143
+ freq_mask_width_range:
1144
+ - 0
1145
+ - 30
1146
+ num_freq_mask: 2
1147
+ apply_time_mask: true
1148
+ time_mask_width_range:
1149
+ - 0
1150
+ - 40
1151
+ num_time_mask: 2
1152
+ normalize: global_mvn
1153
+ normalize_conf:
1154
+ stats_file: exp/asr_stats_raw_vid_sum/train/feats_stats.npz
1155
+ preencoder: null
1156
+ preencoder_conf: {}
1157
+ encoder: conformer
1158
+ encoder_conf:
1159
+ output_size: 512
1160
+ attention_heads: 8
1161
+ linear_units: 2048
1162
+ num_blocks: 12
1163
+ dropout_rate: 0.1
1164
+ positional_dropout_rate: 0.1
1165
+ attention_dropout_rate: 0.1
1166
+ input_layer: conv2d
1167
+ normalize_before: true
1168
+ macaron_style: true
1169
+ pos_enc_layer_type: abs_pos
1170
+ selfattention_layer_type: lf_selfattn
1171
+ activation_type: swish
1172
+ use_cnn_module: true
1173
+ cnn_module_kernel: 31
1174
+ attention_windows:
1175
+ - 40
1176
+ - 40
1177
+ - 40
1178
+ - 40
1179
+ - 40
1180
+ - 40
1181
+ - 40
1182
+ - 40
1183
+ - 40
1184
+ - 40
1185
+ - 40
1186
+ - 40
1187
+ attention_dilation:
1188
+ - 1
1189
+ - 1
1190
+ - 1
1191
+ - 1
1192
+ - 1
1193
+ - 1
1194
+ - 1
1195
+ - 1
1196
+ - 1
1197
+ - 1
1198
+ - 1
1199
+ - 1
1200
+ attention_mode: tvm
1201
+ decoder: transformer
1202
+ decoder_conf:
1203
+ attention_heads: 4
1204
+ linear_units: 512
1205
+ num_blocks: 6
1206
+ dropout_rate: 0.15
1207
+ positional_dropout_rate: 0.15
1208
+ self_attention_dropout_rate: 0.15
1209
+ src_attention_dropout_rate: 0.15
1210
+ required:
1211
+ - output_dir
1212
+ - token_list
1213
+ version: 0.10.0
1214
+ distributed: true
exp/roshansh_how2_asr_raw_ft_sum/images/acc.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/backward_time.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/cer.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/cer_ctc.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/forward_time.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/gpu_max_cached_mem_GB.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/iter_time.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/loss.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/loss_att.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/loss_ctc.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/optim0_lr0.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/optim_step_time.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/train_time.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/images/wer.png ADDED
exp/roshansh_how2_asr_raw_ft_sum/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:309db0e004a1922b28a3a74b3173412b156167dcf08b816fe45f9c7406547808
3
+ size 413117831
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.7a1
2
+ files:
3
+ asr_model_file: exp/roshansh_how2_asr_raw_ft_sum/valid.acc.ave_10best.pth
4
+ python: "3.8.12 (default, Oct 12 2021, 13:49:34) \n[GCC 7.5.0]"
5
+ timestamp: 1644953744.151672
6
+ torch: 1.10.1
7
+ yaml_files:
8
+ asr_train_config: exp/roshansh_how2_asr_raw_ft_sum/config.yaml