Siddhant commited on
Commit
b046f80
1 Parent(s): c1d6237

Update model

Browse files
README.md ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - stop
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/stop_hubert_slu_raw_en_bpe500`
15
+
16
+ This model was trained by Siddhant using stop recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 11890fdd9dd872edc50ce8eb7660d746c6ee160e
23
+ pip install -e .
24
+ cd egs2/stop/asr3
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/stop_hubert_slu_raw_en_bpe500
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Sun Dec 25 13:33:10 EST 2022`
32
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
33
+ - espnet version: `espnet 202205`
34
+ - pytorch version: `pytorch 1.13.0+cu116`
35
+ - Git hash: `11890fdd9dd872edc50ce8eb7660d746c6ee160e`
36
+ - Commit date: `Sat Jun 18 17:05:39 2022 -0400`
37
+
38
+ ## asr_train_asr2_hubert_lr0.002_raw_en_bpe500
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|728701|93.9|3.2|2.9|3.1|9.1|29.8|
44
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|322094|0.0|0.0|100.0|0.0|100.0|100.0|
45
+ |inference_asr_model_valid.acc.ave_10best/test|75636|728701|93.9|3.3|2.8|3.2|9.4|30.6|
46
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|322094|0.0|0.0|100.0|0.0|100.0|100.0|
47
+
48
+ ### CER
49
+
50
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
51
+ |---|---|---|---|---|---|---|---|---|
52
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|5745269|95.9|0.9|3.2|3.2|7.3|29.8|
53
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|2537594|0.0|0.0|100.0|0.0|100.0|100.0|
54
+ |inference_asr_model_valid.acc.ave_10best/test|75636|5745269|95.9|1.0|3.1|3.3|7.4|30.6|
55
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|2537594|0.0|0.0|100.0|0.0|100.0|100.0|
56
+
57
+ ### TER
58
+
59
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
60
+ |---|---|---|---|---|---|---|---|---|
61
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|2091389|95.1|1.5|3.4|3.1|8.0|29.8|
62
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|921077|0.0|0.0|100.0|0.0|100.0|100.0|
63
+ |inference_asr_model_valid.acc.ave_10best/test|75636|2091389|95.2|1.5|3.3|3.3|8.1|30.6|
64
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|921077|0.0|0.0|100.0|0.0|100.0|100.0|
65
+
66
+ ## ASR config
67
+
68
+ <details><summary>expand</summary>
69
+
70
+ ```
71
+ config: conf/train_asr2_hubert_lr0.002.yaml
72
+ print_config: false
73
+ log_level: INFO
74
+ dry_run: false
75
+ iterator_type: sequence
76
+ output_dir: exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500
77
+ ngpu: 1
78
+ seed: 0
79
+ num_workers: 1
80
+ num_att_plot: 3
81
+ dist_backend: nccl
82
+ dist_init_method: env://
83
+ dist_world_size: 4
84
+ dist_rank: 0
85
+ local_rank: 0
86
+ dist_master_addr: localhost
87
+ dist_master_port: 57197
88
+ dist_launcher: null
89
+ multiprocessing_distributed: true
90
+ unused_parameters: false
91
+ sharded_ddp: false
92
+ cudnn_enabled: true
93
+ cudnn_benchmark: false
94
+ cudnn_deterministic: true
95
+ collect_stats: false
96
+ write_collected_feats: false
97
+ max_epoch: 50
98
+ patience: null
99
+ val_scheduler_criterion:
100
+ - valid
101
+ - loss
102
+ early_stopping_criterion:
103
+ - valid
104
+ - loss
105
+ - min
106
+ best_model_criterion:
107
+ - - valid
108
+ - acc
109
+ - max
110
+ keep_nbest_models: 10
111
+ nbest_averaging_interval: 0
112
+ grad_clip: 5.0
113
+ grad_clip_type: 2.0
114
+ grad_noise: false
115
+ accum_grad: 1
116
+ no_forward_run: false
117
+ resume: true
118
+ train_dtype: float32
119
+ use_amp: false
120
+ log_interval: null
121
+ use_matplotlib: true
122
+ use_tensorboard: true
123
+ use_wandb: false
124
+ wandb_project: null
125
+ wandb_id: null
126
+ wandb_entity: null
127
+ wandb_name: null
128
+ wandb_model_log_interval: -1
129
+ detect_anomaly: false
130
+ pretrain_path: null
131
+ init_param: []
132
+ ignore_init_mismatch: false
133
+ freeze_param:
134
+ - frontend.upstream
135
+ num_iters_per_epoch: null
136
+ batch_size: 128
137
+ valid_batch_size: null
138
+ batch_bins: 1000000
139
+ valid_batch_bins: null
140
+ train_shape_file:
141
+ - exp/asr_stats_raw_en_bpe500/train/speech_shape
142
+ - exp/asr_stats_raw_en_bpe500/train/text_shape.bpe
143
+ valid_shape_file:
144
+ - exp/asr_stats_raw_en_bpe500/valid/speech_shape
145
+ - exp/asr_stats_raw_en_bpe500/valid/text_shape.bpe
146
+ batch_type: folded
147
+ valid_batch_type: null
148
+ fold_length:
149
+ - 80000
150
+ - 150
151
+ sort_in_batch: descending
152
+ sort_batch: descending
153
+ multiple_iterator: false
154
+ chunk_length: 500
155
+ chunk_shift_ratio: 0.5
156
+ num_cache_chunks: 1024
157
+ train_data_path_and_name_and_type:
158
+ - - dump/raw/train/wav.scp
159
+ - speech
160
+ - sound
161
+ - - dump/raw/train/text
162
+ - text
163
+ - text
164
+ valid_data_path_and_name_and_type:
165
+ - - dump/raw/valid/wav.scp
166
+ - speech
167
+ - sound
168
+ - - dump/raw/valid/text
169
+ - text
170
+ - text
171
+ allow_variable_data_keys: false
172
+ max_cache_size: 0.0
173
+ max_cache_fd: 32
174
+ valid_max_cache_size: null
175
+ optim: adam
176
+ optim_conf:
177
+ lr: 0.0004
178
+ weight_decay: 1.0e-06
179
+ scheduler: warmuplr
180
+ scheduler_conf:
181
+ warmup_steps: 25000
182
+ token_list:
183
+ - <blank>
184
+ - <unk>
185
+ - ▁[
186
+ - ':'
187
+ - ▁]
188
+ - _
189
+ - SL
190
+ - IN
191
+ - GET
192
+ - S
193
+ - TIME
194
+ - DATE
195
+ - ▁THE
196
+ - ▁TO
197
+ - ▁FOR
198
+ - ▁
199
+ - E
200
+ - LOCATION
201
+ - A
202
+ - WEATHER
203
+ - O
204
+ - ▁ME
205
+ - MUSIC
206
+ - ▁MY
207
+ - CREATE
208
+ - ALARM
209
+ - Y
210
+ - D
211
+ - ▁I
212
+ - T
213
+ - ▁AT
214
+ - I
215
+ - ��A
216
+ - TIMER
217
+ - ▁IS
218
+ - U
219
+ - ▁IN
220
+ - ▁ON
221
+ - EVENT
222
+ - M
223
+ - ▁TIMER
224
+ - TODO
225
+ - REMINDER
226
+ - R
227
+ - ▁PM
228
+ - P
229
+ - ING
230
+ - ▁WHAT
231
+ - ▁THIS
232
+ - ▁TODAY
233
+ - ▁AM
234
+ - N
235
+ - ▁ALARM
236
+ - ▁SET
237
+ - NT
238
+ - METHOD
239
+ - ▁TOMORROW
240
+ - ER
241
+ - TYPE
242
+ - B
243
+ - ATTRIBUTE
244
+ - DESTINATION
245
+ - ▁MINUTES
246
+ - REMINDED
247
+ - PERSON
248
+ - L
249
+ - ▁HOW
250
+ - NAME
251
+ - K
252
+ - ▁FIVE
253
+ - ▁BE
254
+ - ▁'
255
+ - G
256
+ - ▁NEXT
257
+ - 'ON'
258
+ - ▁IT
259
+ - MESSAGE
260
+ - H
261
+ - ▁WILL
262
+ - ▁S
263
+ - ▁WEEK
264
+ - ST
265
+ - C
266
+ - INFO
267
+ - EN
268
+ - CATEGORY
269
+ - TRAFFIC
270
+ - ▁F
271
+ - LE
272
+ - ▁AND
273
+ - AR
274
+ - SEND
275
+ - RE
276
+ - ▁P
277
+ - ▁D
278
+ - ▁FROM
279
+ - RECIPIE
280
+ - PLAY
281
+ - ▁DO
282
+ - ▁TRAFFIC
283
+ - AN
284
+ - ▁AN
285
+ - AL
286
+ - ▁SIX
287
+ - ▁SONG
288
+ - ▁ALL
289
+ - ▁UP
290
+ - CONTENT
291
+ - ▁REMINDER
292
+ - ▁WEEKEND
293
+ - ▁REMIND
294
+ - ▁OF
295
+ - ▁T
296
+ - RA
297
+ - ▁WEATHER
298
+ - ▁SEVEN
299
+ - ▁PLEASE
300
+ - ▁RE
301
+ - ▁TONIGHT
302
+ - EXACT
303
+ - ▁EIGHT
304
+ - ▁W
305
+ - W
306
+ - ▁TEN
307
+ - F
308
+ - SOURCE
309
+ - ▁TIME
310
+ - ESTIMATED
311
+ - RECURRING
312
+ - TH
313
+ - DELETE
314
+ - VE
315
+ - ▁NEW
316
+ - LL
317
+ - ▁EVERY
318
+ - ▁PLAY
319
+ - ES
320
+ - ▁THIRTY
321
+ - ▁GET
322
+ - ▁RAIN
323
+ - CK
324
+ - ▁TWO
325
+ - ▁C
326
+ - ▁CO
327
+ - ▁ARE
328
+ - ▁MESSAGE
329
+ - RI
330
+ - ▁G
331
+ - ▁MORNING
332
+ - CONTACT
333
+ - ▁CAN
334
+ - ▁NOW
335
+ - ▁THREE
336
+ - ▁THERE
337
+ - ET
338
+ - ▁MUSIC
339
+ - TER
340
+ - ▁TAKE
341
+ - IC
342
+ - CH
343
+ - ▁J
344
+ - V
345
+ - ED
346
+ - ▁FOUR
347
+ - DURATION
348
+ - LY
349
+ - ▁E
350
+ - ▁FRIDAY
351
+ - UR
352
+ - ▁YOU
353
+ - ▁ANY
354
+ - ▁NINE
355
+ - ▁GO
356
+ - UNSUPPORTED
357
+ - OR
358
+ - ▁SHOW
359
+ - ▁O
360
+ - ▁BA
361
+ - ▁PA
362
+ - ▁LONG
363
+ - AT
364
+ - ▁ONE
365
+ - ND
366
+ - ▁MA
367
+ - ▁ST
368
+ - ▁GOING
369
+ - ▁LIKE
370
+ - ▁ALARMS
371
+ - ▁BY
372
+ - ▁THAT
373
+ - ▁TWENTY
374
+ - ▁DAY
375
+ - ▁CH
376
+ - ▁MONTH
377
+ - ▁K
378
+ - ▁SH
379
+ - UPDATE
380
+ - ▁MONDAY
381
+ - CE
382
+ - IT
383
+ - IL
384
+ - AMOUNT
385
+ - ▁SATURDAY
386
+ - ▁BR
387
+ - ▁NEED
388
+ - ▁WORK
389
+ - ID
390
+ - ▁DRIVE
391
+ - LA
392
+ - ▁MO
393
+ - ▁HAVE
394
+ - ▁TUESDAY
395
+ - ▁TELL
396
+ - IR
397
+ - HA
398
+ - ''''
399
+ - ▁IF
400
+ - HOME
401
+ - ▁HE
402
+ - ▁LO
403
+ - ▁LA
404
+ - ▁WHEN
405
+ - LO
406
+ - ▁TH
407
+ - ▁REMINDERS
408
+ - IE
409
+ - DISTANCE
410
+ - ▁WE
411
+ - ▁SA
412
+ - ▁HOUR
413
+ - OULD
414
+ - NE
415
+ - DEPARTURE
416
+ - ▁HI
417
+ - ▁LI
418
+ - ARTIST
419
+ - Z
420
+ - TRAVEL
421
+ - ▁OUT
422
+ - PAUSE
423
+ - EST
424
+ - ARRIVAL
425
+ - ▁CANCEL
426
+ - ▁MI
427
+ - ▁OFF
428
+ - ▁FIFTEEN
429
+ - POINT
430
+ - ▁SNOW
431
+ - NA
432
+ - EL
433
+ - ▁EVENTS
434
+ - ▁CA
435
+ - ▁SUNDAY
436
+ - ▁LEAVE
437
+ - TRACK
438
+ - ▁SEND
439
+ - ▁DELETE
440
+ - ▁APPOINTMENT
441
+ - ▁BO
442
+ - RDINAL
443
+ - ▁MAKE
444
+ - ▁NEAR
445
+ - ▁BEFORE
446
+ - GE
447
+ - ▁HOME
448
+ - RELATION
449
+ - ▁V
450
+ - FR
451
+ - ▁THURSDAY
452
+ - ▁LAST
453
+ - DIRECTIONS
454
+ - ▁WEDNESDAY
455
+ - ▁START
456
+ - ▁FORECAST
457
+ - ▁YORK
458
+ - ▁RIGHT
459
+ - UM
460
+ - ▁WITH
461
+ - USE
462
+ - ▁MEETING
463
+ - UT
464
+ - LI
465
+ - ▁CHANGE
466
+ - ▁CAR
467
+ - GENRE
468
+ - ATION
469
+ - X
470
+ - ▁PICK
471
+ - ▁WANT
472
+ - ▁NIGHT
473
+ - SKIP
474
+ - ▁DE
475
+ - ▁RO
476
+ - ▁ABOUT
477
+ - MAP
478
+ - CO
479
+ - MA
480
+ - ▁HOUSE
481
+ - ▁HOT
482
+ - ▁PARTY
483
+ - ▁WA
484
+ - UNIT
485
+ - ▁HERE
486
+ - ▁SU
487
+ - ▁AFTERNOON
488
+ - ▁MUCH
489
+ - ▁MOM
490
+ - ▁TEMPERATURE
491
+ - EQUENC
492
+ - ▁ADD
493
+ - ▁SAN
494
+ - ▁HER
495
+ - ▁CONCERTS
496
+ - ▁CHRISTMAS
497
+ - ▁DINNER
498
+ - ▁MAR
499
+ - LAND
500
+ - ▁HOURS
501
+ - ▁CURRENT
502
+ - ▁TRACK
503
+ - ▁SOME
504
+ - ▁CITY
505
+ - ▁FORTY
506
+ - ATE
507
+ - ▁ROUTE
508
+ - SNOOZE
509
+ - ▁TEXT
510
+ - WORK
511
+ - ▁COLD
512
+ - RELATED
513
+ - ▁OR
514
+ - ▁NO
515
+ - Q
516
+ - ▁WAY
517
+ - WAY
518
+ - ▁MANY
519
+ - ▁BIRTHDAY
520
+ - ▁MINUTE
521
+ - ▁PLAYLIST
522
+ - ▁NOON
523
+ - ▁ROAD
524
+ - TITLE
525
+ - PATH
526
+ - ▁ASK
527
+ - NAVIGATION
528
+ - ▁LEFT
529
+ - ▁ALBUM
530
+ - ▁TURN
531
+ - ▁LATE
532
+ - ▁ELEVEN
533
+ - NEW
534
+ - ▁CELSIUS
535
+ - ▁BUY
536
+ - AVOID
537
+ - LOW
538
+ - NCE
539
+ - SEARCH
540
+ - ▁GAME
541
+ - ▁STOP
542
+ - ▁JO
543
+ - ▁FIRST
544
+ - ▁SHE
545
+ - ▁DOCTOR
546
+ - ▁BU
547
+ - PERIOD
548
+ - ▁WAKE
549
+ - CONDITION
550
+ - ▁EVENING
551
+ - RADIUS
552
+ - MODIFIE
553
+ - ▁REPEAT
554
+ - ▁SECOND
555
+ - ▁CONCERT
556
+ - ▁ANGELES
557
+ - ▁DOWNTOWN
558
+ - ▁UMBRELLA
559
+ - TEMPERATURE
560
+ - ASH
561
+ - ▁YEAR
562
+ - GROUP
563
+ - ▁DRIVING
564
+ - ▁GIVE
565
+ - ▁HUNDRED
566
+ - ▁HO
567
+ - ▁MILES
568
+ - PLAYLIST
569
+ - ADD
570
+ - RETRIEV
571
+ - ▁TWELVE
572
+ - EAD
573
+ - ▁CLASS
574
+ - ▁FREE
575
+ - PORT
576
+ - VILLE
577
+ - ▁BETWEEN
578
+ - ▁KNOW
579
+ - ▁AROUND
580
+ - ▁SCHOOL
581
+ - ▁NINETY
582
+ - PROVIDER
583
+ - SILENCE
584
+ - RESUME
585
+ - ▁LET
586
+ - TION
587
+ - ▁AUGUST
588
+ - ▁HAPPENING
589
+ - ▁AFTER
590
+ - ▁FAHRENHEIT
591
+ - ▁EX
592
+ - ▁VIDEO
593
+ - ROAD
594
+ - ▁PARK
595
+ - ▁CHICAGO
596
+ - ▁DAILY
597
+ - ▁CHECK
598
+ - ▁BEACH
599
+ - ▁WHERE
600
+ - ▁JUNE
601
+ - ▁STREET
602
+ - ▁FESTIVAL
603
+ - ▁FLORIDA
604
+ - ▁JOHN
605
+ - ▁HAS
606
+ - ▁SPOTIFY
607
+ - ▁BILL
608
+ - RESTART
609
+ - ▁HIGHWAY
610
+ - ▁SEATTLE
611
+ - J
612
+ - ▁LUNCH
613
+ - ▁LOOK
614
+ - ▁FRIEND
615
+ - ▁COMING
616
+ - ▁ALERT
617
+ - IGHT
618
+ - ▁PANDORA
619
+ - ▁HEAVY
620
+ - ▁KIDS
621
+ - ▁MOVIE
622
+ - ▁SOUTH
623
+ - REACT
624
+ - ▁CONSTRUCTION
625
+ - PREVIOUS
626
+ - ▁ORLANDO
627
+ - ▁OVER
628
+ - ▁MIAMI
629
+ - REACTION
630
+ - ▁ATLANTA
631
+ - ▁ACCIDENT
632
+ - ▁COUNTRY
633
+ - ▁NORTH
634
+ - ▁LIGHT
635
+ - RADIO
636
+ - ▁READ
637
+ - ▁FAMILY
638
+ - ▁AIRPORT
639
+ - ▁EXPECT
640
+ - ▁DEGREE
641
+ - ▁PRO
642
+ - ▁PARTIES
643
+ - ▁FIFTY
644
+ - ▁HIGH
645
+ - ▁PLAN
646
+ - ▁FOOD
647
+ - ▁WARM
648
+ - ▁SUNNY
649
+ - ▁VEGAS
650
+ - ▁HOLIDAY
651
+ - ▁SCHEDULE
652
+ - ▁STORM
653
+ - ▁FIFTH
654
+ - ▁BOSTON
655
+ - ▁FRANCISCO
656
+ - ▁LONDON
657
+ - ATTENDEE
658
+ - ▁JULY
659
+ - ▁WALK
660
+ - ▁COMMUTE
661
+ - ▁CLEAN
662
+ - ▁DENTIST
663
+ - TOWN
664
+ - ▁AGAIN
665
+ - ▁DALLAS
666
+ - ▁PORTLAND
667
+ - ▁SEPTEMBER
668
+ - ▁ARRIVE
669
+ - ▁SISTER
670
+ - ▁HOUSTON
671
+ - Ã
672
+ - É
673
+ - Í
674
+ - '*'
675
+ - Á
676
+ - Ç
677
+ - Ó
678
+ - ']'
679
+ - '['
680
+ - Ú
681
+ - Ü
682
+ - <sos/eos>
683
+ transcript_token_list: null
684
+ two_pass: false
685
+ pre_postencoder_norm: false
686
+ init: null
687
+ input_size: null
688
+ ctc_conf:
689
+ dropout_rate: 0.0
690
+ ctc_type: builtin
691
+ reduce: true
692
+ ignore_nan_grad: true
693
+ joint_net_conf: null
694
+ use_preprocessor: true
695
+ token_type: bpe
696
+ bpemodel: data/en_token_list/bpe_unigram500/bpe.model
697
+ non_linguistic_symbols: null
698
+ cleaner: null
699
+ g2p: null
700
+ speech_volume_normalize: null
701
+ rir_scp: null
702
+ rir_apply_prob: 1.0
703
+ noise_scp: null
704
+ noise_apply_prob: 1.0
705
+ noise_db_range: '13_15'
706
+ frontend: s3prl
707
+ frontend_conf:
708
+ frontend_conf:
709
+ upstream: hubert_large_ll60k
710
+ download_dir: ./hub
711
+ multilayer_feature: true
712
+ fs: 16k
713
+ specaug: specaug
714
+ specaug_conf:
715
+ apply_time_warp: true
716
+ time_warp_window: 5
717
+ time_warp_mode: bicubic
718
+ apply_freq_mask: true
719
+ freq_mask_width_range:
720
+ - 0
721
+ - 30
722
+ num_freq_mask: 2
723
+ apply_time_mask: true
724
+ time_mask_width_range:
725
+ - 0
726
+ - 40
727
+ num_time_mask: 2
728
+ normalize: utterance_mvn
729
+ normalize_conf: {}
730
+ model: espnet
731
+ model_conf:
732
+ ctc_weight: 0.3
733
+ lsm_weight: 0.1
734
+ length_normalized_loss: false
735
+ extract_feats_in_collect_stats: false
736
+ preencoder: linear
737
+ preencoder_conf:
738
+ input_size: 1024
739
+ output_size: 80
740
+ encoder: conformer
741
+ encoder_conf:
742
+ output_size: 512
743
+ attention_heads: 8
744
+ linear_units: 2048
745
+ num_blocks: 12
746
+ dropout_rate: 0.1
747
+ positional_dropout_rate: 0.1
748
+ attention_dropout_rate: 0.1
749
+ input_layer: conv2d2
750
+ normalize_before: true
751
+ macaron_style: true
752
+ rel_pos_type: latest
753
+ pos_enc_layer_type: rel_pos
754
+ selfattention_layer_type: rel_selfattn
755
+ activation_type: swish
756
+ use_cnn_module: true
757
+ cnn_module_kernel: 31
758
+ postencoder: null
759
+ postencoder_conf: {}
760
+ deliberationencoder: null
761
+ deliberationencoder_conf: {}
762
+ decoder: transformer
763
+ decoder_conf:
764
+ attention_heads: 8
765
+ linear_units: 2048
766
+ num_blocks: 6
767
+ dropout_rate: 0.1
768
+ positional_dropout_rate: 0.1
769
+ self_attention_dropout_rate: 0.1
770
+ src_attention_dropout_rate: 0.1
771
+ decoder2: null
772
+ decoder2_conf: {}
773
+ postdecoder: null
774
+ postdecoder_conf: {}
775
+ required:
776
+ - output_dir
777
+ - token_list
778
+ version: '202205'
779
+ distributed: true
780
+ ```
781
+
782
+ </details>
783
+
784
+
785
+
786
+ ### Citing ESPnet
787
+
788
+ ```BibTex
789
+ @inproceedings{watanabe2018espnet,
790
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
791
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
792
+ year={2018},
793
+ booktitle={Proceedings of Interspeech},
794
+ pages={2207--2211},
795
+ doi={10.21437/Interspeech.2018-1456},
796
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
797
+ }
798
+
799
+
800
+
801
+
802
+ ```
803
+
804
+ or arXiv:
805
+
806
+ ```bibtex
807
+ @misc{watanabe2018espnet,
808
+ title={ESPnet: End-to-End Speech Processing Toolkit},
809
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
810
+ year={2018},
811
+ eprint={1804.00015},
812
+ archivePrefix={arXiv},
813
+ primaryClass={cs.CL}
814
+ }
815
+ ```
data/en_token_list/bpe_unigram500/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27d24b5d25713fdcb0697dfe5f78aaa56d67057bce490b8b0a8641ac63385496
3
+ size 245232
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/RESULTS.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Dec 25 13:33:10 EST 2022`
5
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
6
+ - espnet version: `espnet 202205`
7
+ - pytorch version: `pytorch 1.13.0+cu116`
8
+ - Git hash: `11890fdd9dd872edc50ce8eb7660d746c6ee160e`
9
+ - Commit date: `Sat Jun 18 17:05:39 2022 -0400`
10
+
11
+ ## asr_train_asr2_hubert_lr0.002_raw_en_bpe500
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|728701|93.9|3.2|2.9|3.1|9.1|29.8|
17
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|322094|0.0|0.0|100.0|0.0|100.0|100.0|
18
+ |inference_asr_model_valid.acc.ave_10best/test|75636|728701|93.9|3.3|2.8|3.2|9.4|30.6|
19
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|322094|0.0|0.0|100.0|0.0|100.0|100.0|
20
+
21
+ ### CER
22
+
23
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
24
+ |---|---|---|---|---|---|---|---|---|
25
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|5745269|95.9|0.9|3.2|3.2|7.3|29.8|
26
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|2537594|0.0|0.0|100.0|0.0|100.0|100.0|
27
+ |inference_asr_model_valid.acc.ave_10best/test|75636|5745269|95.9|1.0|3.1|3.3|7.4|30.6|
28
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|2537594|0.0|0.0|100.0|0.0|100.0|100.0|
29
+
30
+ ### TER
31
+
32
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
33
+ |---|---|---|---|---|---|---|---|---|
34
+ |decode_asr_asr_model_valid.acc.ave_10best/test|75636|2091389|95.1|1.5|3.4|3.1|8.0|29.8|
35
+ |decode_asr_asr_model_valid.acc.ave_10best/valid|33384|921077|0.0|0.0|100.0|0.0|100.0|100.0|
36
+ |inference_asr_model_valid.acc.ave_10best/test|75636|2091389|95.2|1.5|3.3|3.3|8.1|30.6|
37
+ |inference_asr_model_valid.acc.ave_10best/valid|33384|921077|0.0|0.0|100.0|0.0|100.0|100.0|
38
+
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/config.yaml ADDED
@@ -0,0 +1,709 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr2_hubert_lr0.002.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 57197
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 50
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ use_matplotlib: true
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ wandb_entity: null
57
+ wandb_name: null
58
+ wandb_model_log_interval: -1
59
+ detect_anomaly: false
60
+ pretrain_path: null
61
+ init_param: []
62
+ ignore_init_mismatch: false
63
+ freeze_param:
64
+ - frontend.upstream
65
+ num_iters_per_epoch: null
66
+ batch_size: 128
67
+ valid_batch_size: null
68
+ batch_bins: 1000000
69
+ valid_batch_bins: null
70
+ train_shape_file:
71
+ - exp/asr_stats_raw_en_bpe500/train/speech_shape
72
+ - exp/asr_stats_raw_en_bpe500/train/text_shape.bpe
73
+ valid_shape_file:
74
+ - exp/asr_stats_raw_en_bpe500/valid/speech_shape
75
+ - exp/asr_stats_raw_en_bpe500/valid/text_shape.bpe
76
+ batch_type: folded
77
+ valid_batch_type: null
78
+ fold_length:
79
+ - 80000
80
+ - 150
81
+ sort_in_batch: descending
82
+ sort_batch: descending
83
+ multiple_iterator: false
84
+ chunk_length: 500
85
+ chunk_shift_ratio: 0.5
86
+ num_cache_chunks: 1024
87
+ train_data_path_and_name_and_type:
88
+ - - dump/raw/train/wav.scp
89
+ - speech
90
+ - sound
91
+ - - dump/raw/train/text
92
+ - text
93
+ - text
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/valid/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/valid/text
99
+ - text
100
+ - text
101
+ allow_variable_data_keys: false
102
+ max_cache_size: 0.0
103
+ max_cache_fd: 32
104
+ valid_max_cache_size: null
105
+ optim: adam
106
+ optim_conf:
107
+ lr: 0.0004
108
+ weight_decay: 1.0e-06
109
+ scheduler: warmuplr
110
+ scheduler_conf:
111
+ warmup_steps: 25000
112
+ token_list:
113
+ - <blank>
114
+ - <unk>
115
+ - ▁[
116
+ - ':'
117
+ - ▁]
118
+ - _
119
+ - SL
120
+ - IN
121
+ - GET
122
+ - S
123
+ - TIME
124
+ - DATE
125
+ - ▁THE
126
+ - ▁TO
127
+ - ▁FOR
128
+ - ▁
129
+ - E
130
+ - LOCATION
131
+ - A
132
+ - WEATHER
133
+ - O
134
+ - ▁ME
135
+ - MUSIC
136
+ - ▁MY
137
+ - CREATE
138
+ - ALARM
139
+ - Y
140
+ - D
141
+ - ▁I
142
+ - T
143
+ - ▁AT
144
+ - I
145
+ - ▁A
146
+ - TIMER
147
+ - ▁IS
148
+ - U
149
+ - ▁IN
150
+ - ▁ON
151
+ - EVENT
152
+ - M
153
+ - ▁TIMER
154
+ - TODO
155
+ - REMINDER
156
+ - R
157
+ - ▁PM
158
+ - P
159
+ - ING
160
+ - ▁WHAT
161
+ - ▁THIS
162
+ - ▁TODAY
163
+ - ▁AM
164
+ - N
165
+ - ▁ALARM
166
+ - ▁SET
167
+ - NT
168
+ - METHOD
169
+ - ▁TOMORROW
170
+ - ER
171
+ - TYPE
172
+ - B
173
+ - ATTRIBUTE
174
+ - DESTINATION
175
+ - ▁MINUTES
176
+ - REMINDED
177
+ - PERSON
178
+ - L
179
+ - ▁HOW
180
+ - NAME
181
+ - K
182
+ - ▁FIVE
183
+ - ▁BE
184
+ - ▁'
185
+ - G
186
+ - ▁NEXT
187
+ - 'ON'
188
+ - ▁IT
189
+ - MESSAGE
190
+ - H
191
+ - ▁WILL
192
+ - ▁S
193
+ - ▁WEEK
194
+ - ST
195
+ - C
196
+ - INFO
197
+ - EN
198
+ - CATEGORY
199
+ - TRAFFIC
200
+ - ▁F
201
+ - LE
202
+ - ▁AND
203
+ - AR
204
+ - SEND
205
+ - RE
206
+ - ▁P
207
+ - ▁D
208
+ - ▁FROM
209
+ - RECIPIE
210
+ - PLAY
211
+ - ▁DO
212
+ - ▁TRAFFIC
213
+ - AN
214
+ - ▁AN
215
+ - AL
216
+ - ▁SIX
217
+ - ▁SONG
218
+ - ▁ALL
219
+ - ▁UP
220
+ - CONTENT
221
+ - ▁REMINDER
222
+ - ▁WEEKEND
223
+ - ▁REMIND
224
+ - ▁OF
225
+ - ▁T
226
+ - RA
227
+ - ▁WEATHER
228
+ - ▁SEVEN
229
+ - ▁PLEASE
230
+ - ▁RE
231
+ - ▁TONIGHT
232
+ - EXACT
233
+ - ▁EIGHT
234
+ - ▁W
235
+ - W
236
+ - ▁TEN
237
+ - F
238
+ - SOURCE
239
+ - ▁TIME
240
+ - ESTIMATED
241
+ - RECURRING
242
+ - TH
243
+ - DELETE
244
+ - VE
245
+ - ▁NEW
246
+ - LL
247
+ - ▁EVERY
248
+ - ▁PLAY
249
+ - ES
250
+ - ▁THIRTY
251
+ - ▁GET
252
+ - ▁RAIN
253
+ - CK
254
+ - ▁TWO
255
+ - ▁C
256
+ - ▁CO
257
+ - ▁ARE
258
+ - ▁MESSAGE
259
+ - RI
260
+ - ▁G
261
+ - ▁MORNING
262
+ - CONTACT
263
+ - ▁CAN
264
+ - ▁NOW
265
+ - ▁THREE
266
+ - ▁THERE
267
+ - ET
268
+ - ▁MUSIC
269
+ - TER
270
+ - ▁TAKE
271
+ - IC
272
+ - CH
273
+ - ▁J
274
+ - V
275
+ - ED
276
+ - ▁FOUR
277
+ - DURATION
278
+ - LY
279
+ - ▁E
280
+ - ▁FRIDAY
281
+ - UR
282
+ - ▁YOU
283
+ - ▁ANY
284
+ - ▁NINE
285
+ - ▁GO
286
+ - UNSUPPORTED
287
+ - OR
288
+ - ▁SHOW
289
+ - ▁O
290
+ - ▁BA
291
+ - ▁PA
292
+ - ▁LONG
293
+ - AT
294
+ - ▁ONE
295
+ - ND
296
+ - ▁MA
297
+ - ▁ST
298
+ - ▁GOING
299
+ - ▁LIKE
300
+ - ▁ALARMS
301
+ - ▁BY
302
+ - ▁THAT
303
+ - ▁TWENTY
304
+ - ▁DAY
305
+ - ▁CH
306
+ - ▁MONTH
307
+ - ▁K
308
+ - ▁SH
309
+ - UPDATE
310
+ - ▁MONDAY
311
+ - CE
312
+ - IT
313
+ - IL
314
+ - AMOUNT
315
+ - ▁SATURDAY
316
+ - ▁BR
317
+ - ▁NEED
318
+ - ▁WORK
319
+ - ID
320
+ - ▁DRIVE
321
+ - LA
322
+ - ▁MO
323
+ - ▁HAVE
324
+ - ▁TUESDAY
325
+ - ▁TELL
326
+ - IR
327
+ - HA
328
+ - ''''
329
+ - ▁IF
330
+ - HOME
331
+ - ▁HE
332
+ - ▁LO
333
+ - ▁LA
334
+ - ▁WHEN
335
+ - LO
336
+ - ▁TH
337
+ - ▁REMINDERS
338
+ - IE
339
+ - DISTANCE
340
+ - ▁WE
341
+ - ▁SA
342
+ - ▁HOUR
343
+ - OULD
344
+ - NE
345
+ - DEPARTURE
346
+ - ▁HI
347
+ - ▁LI
348
+ - ARTIST
349
+ - Z
350
+ - TRAVEL
351
+ - ▁OUT
352
+ - PAUSE
353
+ - EST
354
+ - ARRIVAL
355
+ - ▁CANCEL
356
+ - ▁MI
357
+ - ▁OFF
358
+ - ▁FIFTEEN
359
+ - POINT
360
+ - ▁SNOW
361
+ - NA
362
+ - EL
363
+ - ▁EVENTS
364
+ - ▁CA
365
+ - ▁SUNDAY
366
+ - ▁LEAVE
367
+ - TRACK
368
+ - ▁SEND
369
+ - ▁DELETE
370
+ - ▁APPOINTMENT
371
+ - ▁BO
372
+ - RDINAL
373
+ - ▁MAKE
374
+ - ▁NEAR
375
+ - ▁BEFORE
376
+ - GE
377
+ - ▁HOME
378
+ - RELATION
379
+ - ▁V
380
+ - FR
381
+ - ▁THURSDAY
382
+ - ▁LAST
383
+ - DIRECTIONS
384
+ - ▁WEDNESDAY
385
+ - ▁START
386
+ - ▁FORECAST
387
+ - ▁YORK
388
+ - ▁RIGHT
389
+ - UM
390
+ - ▁WITH
391
+ - USE
392
+ - ▁MEETING
393
+ - UT
394
+ - LI
395
+ - ▁CHANGE
396
+ - ▁CAR
397
+ - GENRE
398
+ - ATION
399
+ - X
400
+ - ▁PICK
401
+ - ▁WANT
402
+ - ▁NIGHT
403
+ - SKIP
404
+ - ▁DE
405
+ - ▁RO
406
+ - ▁ABOUT
407
+ - MAP
408
+ - CO
409
+ - MA
410
+ - ▁HOUSE
411
+ - ▁HOT
412
+ - ▁PARTY
413
+ - ▁WA
414
+ - UNIT
415
+ - ▁HERE
416
+ - ▁SU
417
+ - ▁AFTERNOON
418
+ - ▁MUCH
419
+ - ▁MOM
420
+ - ▁TEMPERATURE
421
+ - EQUENC
422
+ - ▁ADD
423
+ - ▁SAN
424
+ - ▁HER
425
+ - ▁CONCERTS
426
+ - ▁CHRISTMAS
427
+ - ▁DINNER
428
+ - ▁MAR
429
+ - LAND
430
+ - ▁HOURS
431
+ - ▁CURRENT
432
+ - ▁TRACK
433
+ - ▁SOME
434
+ - ▁CITY
435
+ - ▁FORTY
436
+ - ATE
437
+ - ▁ROUTE
438
+ - SNOOZE
439
+ - ▁TEXT
440
+ - WORK
441
+ - ▁COLD
442
+ - RELATED
443
+ - ▁OR
444
+ - ▁NO
445
+ - Q
446
+ - ▁WAY
447
+ - WAY
448
+ - ▁MANY
449
+ - ▁BIRTHDAY
450
+ - ▁MINUTE
451
+ - ▁PLAYLIST
452
+ - ▁NOON
453
+ - ▁ROAD
454
+ - TITLE
455
+ - PATH
456
+ - ▁ASK
457
+ - NAVIGATION
458
+ - ▁LEFT
459
+ - ▁ALBUM
460
+ - ▁TURN
461
+ - ▁LATE
462
+ - ▁ELEVEN
463
+ - NEW
464
+ - ▁CELSIUS
465
+ - ▁BUY
466
+ - AVOID
467
+ - LOW
468
+ - NCE
469
+ - SEARCH
470
+ - ▁GAME
471
+ - ▁STOP
472
+ - ▁JO
473
+ - ▁FIRST
474
+ - ▁SHE
475
+ - ▁DOCTOR
476
+ - ▁BU
477
+ - PERIOD
478
+ - ▁WAKE
479
+ - CONDITION
480
+ - ▁EVENING
481
+ - RADIUS
482
+ - MODIFIE
483
+ - ▁REPEAT
484
+ - ▁SECOND
485
+ - ▁CONCERT
486
+ - ▁ANGELES
487
+ - ▁DOWNTOWN
488
+ - ▁UMBRELLA
489
+ - TEMPERATURE
490
+ - ASH
491
+ - ▁YEAR
492
+ - GROUP
493
+ - ▁DRIVING
494
+ - ▁GIVE
495
+ - ▁HUNDRED
496
+ - ▁HO
497
+ - ▁MILES
498
+ - PLAYLIST
499
+ - ADD
500
+ - RETRIEV
501
+ - ▁TWELVE
502
+ - EAD
503
+ - ▁CLASS
504
+ - ▁FREE
505
+ - PORT
506
+ - VILLE
507
+ - ▁BETWEEN
508
+ - ▁KNOW
509
+ - ▁AROUND
510
+ - ▁SCHOOL
511
+ - ▁NINETY
512
+ - PROVIDER
513
+ - SILENCE
514
+ - RESUME
515
+ - ▁LET
516
+ - TION
517
+ - ▁AUGUST
518
+ - ▁HAPPENING
519
+ - ▁AFTER
520
+ - ▁FAHRENHEIT
521
+ - ▁EX
522
+ - ▁VIDEO
523
+ - ROAD
524
+ - ▁PARK
525
+ - ▁CHICAGO
526
+ - ▁DAILY
527
+ - ▁CHECK
528
+ - ▁BEACH
529
+ - ▁WHERE
530
+ - ▁JUNE
531
+ - ▁STREET
532
+ - ▁FESTIVAL
533
+ - ▁FLORIDA
534
+ - ▁JOHN
535
+ - ▁HAS
536
+ - ▁SPOTIFY
537
+ - ▁BILL
538
+ - RESTART
539
+ - ▁HIGHWAY
540
+ - ▁SEATTLE
541
+ - J
542
+ - ▁LUNCH
543
+ - ▁LOOK
544
+ - ▁FRIEND
545
+ - ▁COMING
546
+ - ▁ALERT
547
+ - IGHT
548
+ - ▁PANDORA
549
+ - ▁HEAVY
550
+ - ▁KIDS
551
+ - ▁MOVIE
552
+ - ▁SOUTH
553
+ - REACT
554
+ - ▁CONSTRUCTION
555
+ - PREVIOUS
556
+ - ▁ORLANDO
557
+ - ▁OVER
558
+ - ▁MIAMI
559
+ - REACTION
560
+ - ▁ATLANTA
561
+ - ▁ACCIDENT
562
+ - ▁COUNTRY
563
+ - ▁NORTH
564
+ - ▁LIGHT
565
+ - RADIO
566
+ - ▁READ
567
+ - ▁FAMILY
568
+ - ▁AIRPORT
569
+ - ▁EXPECT
570
+ - ▁DEGREE
571
+ - ▁PRO
572
+ - ▁PARTIES
573
+ - ▁FIFTY
574
+ - ▁HIGH
575
+ - ▁PLAN
576
+ - ▁FOOD
577
+ - ▁WARM
578
+ - ▁SUNNY
579
+ - ▁VEGAS
580
+ - ▁HOLIDAY
581
+ - ▁SCHEDULE
582
+ - ▁STORM
583
+ - ▁FIFTH
584
+ - ▁BOSTON
585
+ - ▁FRANCISCO
586
+ - ▁LONDON
587
+ - ATTENDEE
588
+ - ▁JULY
589
+ - ▁WALK
590
+ - ▁COMMUTE
591
+ - ▁CLEAN
592
+ - ▁DENTIST
593
+ - TOWN
594
+ - ▁AGAIN
595
+ - ▁DALLAS
596
+ - ▁PORTLAND
597
+ - ▁SEPTEMBER
598
+ - ▁ARRIVE
599
+ - ▁SISTER
600
+ - ▁HOUSTON
601
+ - Ã
602
+ - É
603
+ - Í
604
+ - '*'
605
+ - Á
606
+ - Ç
607
+ - Ó
608
+ - ']'
609
+ - '['
610
+ - Ú
611
+ - Ü
612
+ - <sos/eos>
613
+ transcript_token_list: null
614
+ two_pass: false
615
+ pre_postencoder_norm: false
616
+ init: null
617
+ input_size: null
618
+ ctc_conf:
619
+ dropout_rate: 0.0
620
+ ctc_type: builtin
621
+ reduce: true
622
+ ignore_nan_grad: true
623
+ joint_net_conf: null
624
+ use_preprocessor: true
625
+ token_type: bpe
626
+ bpemodel: data/en_token_list/bpe_unigram500/bpe.model
627
+ non_linguistic_symbols: null
628
+ cleaner: null
629
+ g2p: null
630
+ speech_volume_normalize: null
631
+ rir_scp: null
632
+ rir_apply_prob: 1.0
633
+ noise_scp: null
634
+ noise_apply_prob: 1.0
635
+ noise_db_range: '13_15'
636
+ frontend: s3prl
637
+ frontend_conf:
638
+ frontend_conf:
639
+ upstream: hubert_large_ll60k
640
+ download_dir: ./hub
641
+ multilayer_feature: true
642
+ fs: 16k
643
+ specaug: specaug
644
+ specaug_conf:
645
+ apply_time_warp: true
646
+ time_warp_window: 5
647
+ time_warp_mode: bicubic
648
+ apply_freq_mask: true
649
+ freq_mask_width_range:
650
+ - 0
651
+ - 30
652
+ num_freq_mask: 2
653
+ apply_time_mask: true
654
+ time_mask_width_range:
655
+ - 0
656
+ - 40
657
+ num_time_mask: 2
658
+ normalize: utterance_mvn
659
+ normalize_conf: {}
660
+ model: espnet
661
+ model_conf:
662
+ ctc_weight: 0.3
663
+ lsm_weight: 0.1
664
+ length_normalized_loss: false
665
+ extract_feats_in_collect_stats: false
666
+ preencoder: linear
667
+ preencoder_conf:
668
+ input_size: 1024
669
+ output_size: 80
670
+ encoder: conformer
671
+ encoder_conf:
672
+ output_size: 512
673
+ attention_heads: 8
674
+ linear_units: 2048
675
+ num_blocks: 12
676
+ dropout_rate: 0.1
677
+ positional_dropout_rate: 0.1
678
+ attention_dropout_rate: 0.1
679
+ input_layer: conv2d2
680
+ normalize_before: true
681
+ macaron_style: true
682
+ rel_pos_type: latest
683
+ pos_enc_layer_type: rel_pos
684
+ selfattention_layer_type: rel_selfattn
685
+ activation_type: swish
686
+ use_cnn_module: true
687
+ cnn_module_kernel: 31
688
+ postencoder: null
689
+ postencoder_conf: {}
690
+ deliberationencoder: null
691
+ deliberationencoder_conf: {}
692
+ decoder: transformer
693
+ decoder_conf:
694
+ attention_heads: 8
695
+ linear_units: 2048
696
+ num_blocks: 6
697
+ dropout_rate: 0.1
698
+ positional_dropout_rate: 0.1
699
+ self_attention_dropout_rate: 0.1
700
+ src_attention_dropout_rate: 0.1
701
+ decoder2: null
702
+ decoder2_conf: {}
703
+ postdecoder: null
704
+ postdecoder_conf: {}
705
+ required:
706
+ - output_dir
707
+ - token_list
708
+ version: '202205'
709
+ distributed: true
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/acc.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/backward_time.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/cer.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/cer_ctc.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/forward_time.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/iter_time.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/loss.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/loss_att.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/loss_ctc.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/optim0_lr0.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/optim_step_time.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/train_time.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/images/wer.png ADDED
exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:033429e1cae3fe196282f17888ec71b9cae5b7b0fe2fac84ab6c717d4ff2f1b3
3
+ size 1723018419
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202205'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/valid.acc.ave_10best.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1672045757.354614
6
+ torch: 1.13.0+cu116
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr2_hubert_lr0.002_raw_en_bpe500/config.yaml