ms-dot-k commited on
Commit
e2337d9
1 Parent(s): 9a14813

Add model files

Browse files
Files changed (24) hide show
  1. README.md +1288 -1
  2. data/en_token_list/bpe_unigram1000/bpe.model +3 -0
  3. exp/asr_stats_extracted_en_bpe1000/train/feats_stats.npz +3 -0
  4. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/RESULTS.md +29 -0
  5. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/config.yaml +1187 -0
  6. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/acc.png +0 -0
  7. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/backward_time.png +0 -0
  8. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/cer.png +0 -0
  9. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/cer_ctc.png +0 -0
  10. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/clip.png +0 -0
  11. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/forward_time.png +0 -0
  12. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/gpu_max_cached_mem_GB.png +0 -0
  13. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/grad_norm.png +0 -0
  14. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/iter_time.png +0 -0
  15. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss.png +0 -0
  16. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_att.png +0 -0
  17. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_ctc.png +0 -0
  18. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_scale.png +0 -0
  19. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/optim0_lr0.png +0 -0
  20. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/optim_step_time.png +0 -0
  21. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/train_time.png +0 -0
  22. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/wer.png +0 -0
  23. exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/valid.acc.ave_10best.pth +3 -0
  24. meta.yaml +7 -0
README.md CHANGED
@@ -1,3 +1,1290 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - lrs3
9
+ license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 AVSR model
13
+
14
+ ### `espnet/msk_lrs3_train_avsr_avhubert_large_extracted_en_bpe1000`
15
+
16
+ This model was trained by ms-dot-k using lrs3 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ pip install -e .
26
+ cd egs2/lrs3/avsr1
27
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/msk_lrs3_train_avsr_avhubert_large_extracted_en_bpe1000
28
+ ```
29
+
30
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
31
+ # RESULTS
32
+ ## Environments
33
+ - date: `Thu Sep 28 23:59:06 KST 2023`
34
+ - python version: `3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]`
35
+ - espnet version: `espnet 202308`
36
+ - pytorch version: `pytorch 1.12.0`
37
+ - Git hash: `5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352`
38
+ - Commit date: `Wed Aug 30 18:03:42 2023 -0400`
39
+
40
+ ## exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000
41
+ ### WER
42
+
43
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
44
+ |---|---|---|---|---|---|---|---|---|
45
+ |inference_asr_model_valid.acc.ave/test|1321|9890|98.5|1.1|0.4|0.2|1.7|8.8|
46
+
47
+ ### CER
48
+
49
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
50
+ |---|---|---|---|---|---|---|---|---|
51
+ |inference_asr_model_valid.acc.ave/test|1321|49750|99.4|0.2|0.4|0.2|0.8|8.8|
52
+
53
+ ### TER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |inference_asr_model_valid.acc.ave/test|1321|14940|98.8|0.8|0.4|0.3|1.5|8.8|
58
+
59
+
60
+
61
+ ## ASR config
62
+
63
+ <details><summary>expand</summary>
64
+
65
+ ```
66
+ config: conf/train_avsr_avhubert_large.yaml
67
+ print_config: false
68
+ log_level: INFO
69
+ drop_last_iter: false
70
+ dry_run: false
71
+ iterator_type: sequence
72
+ valid_iterator_type: null
73
+ output_dir: exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000
74
+ ngpu: 1
75
+ seed: 0
76
+ num_workers: 1
77
+ num_att_plot: 3
78
+ dist_backend: nccl
79
+ dist_init_method: env://
80
+ dist_world_size: 4
81
+ dist_rank: 0
82
+ local_rank: 0
83
+ dist_master_addr: localhost
84
+ dist_master_port: 54927
85
+ dist_launcher: null
86
+ multiprocessing_distributed: true
87
+ unused_parameters: true
88
+ sharded_ddp: false
89
+ cudnn_enabled: true
90
+ cudnn_benchmark: false
91
+ cudnn_deterministic: true
92
+ collect_stats: false
93
+ write_collected_feats: false
94
+ max_epoch: 20
95
+ patience: null
96
+ val_scheduler_criterion:
97
+ - valid
98
+ - loss
99
+ early_stopping_criterion:
100
+ - valid
101
+ - loss
102
+ - min
103
+ best_model_criterion:
104
+ - - valid
105
+ - acc
106
+ - max
107
+ keep_nbest_models: 10
108
+ nbest_averaging_interval: 0
109
+ grad_clip: 5.0
110
+ grad_clip_type: 2.0
111
+ grad_noise: false
112
+ accum_grad: 1
113
+ no_forward_run: false
114
+ resume: true
115
+ train_dtype: float32
116
+ use_amp: false
117
+ log_interval: null
118
+ use_matplotlib: true
119
+ use_tensorboard: true
120
+ create_graph_in_tensorboard: false
121
+ use_wandb: false
122
+ wandb_project: null
123
+ wandb_id: null
124
+ wandb_entity: null
125
+ wandb_name: null
126
+ wandb_model_log_interval: -1
127
+ detect_anomaly: false
128
+ pretrain_path: null
129
+ init_param: []
130
+ ignore_init_mismatch: false
131
+ freeze_param: []
132
+ num_iters_per_epoch: null
133
+ batch_size: 16
134
+ valid_batch_size: null
135
+ batch_bins: 1000000
136
+ valid_batch_bins: null
137
+ train_shape_file:
138
+ - exp/asr_stats_extracted_en_bpe1000/train/speech_shape
139
+ - exp/asr_stats_extracted_en_bpe1000/train/text_shape.bpe
140
+ valid_shape_file:
141
+ - exp/asr_stats_extracted_en_bpe1000/valid/speech_shape
142
+ - exp/asr_stats_extracted_en_bpe1000/valid/text_shape.bpe
143
+ batch_type: folded
144
+ valid_batch_type: null
145
+ fold_length:
146
+ - 800
147
+ - 150
148
+ sort_in_batch: descending
149
+ shuffle_within_batch: false
150
+ sort_batch: descending
151
+ multiple_iterator: false
152
+ chunk_length: 500
153
+ chunk_shift_ratio: 0.5
154
+ num_cache_chunks: 1024
155
+ chunk_excluded_key_prefixes: []
156
+ train_data_path_and_name_and_type:
157
+ - - dump/extracted/train/feats.scp
158
+ - speech
159
+ - kaldi_ark
160
+ - - dump/extracted/train/text
161
+ - text
162
+ - text
163
+ valid_data_path_and_name_and_type:
164
+ - - dump/extracted/val/feats.scp
165
+ - speech
166
+ - kaldi_ark
167
+ - - dump/extracted/val/text
168
+ - text
169
+ - text
170
+ allow_variable_data_keys: false
171
+ max_cache_size: 0.0
172
+ max_cache_fd: 32
173
+ valid_max_cache_size: null
174
+ exclude_weight_decay: false
175
+ exclude_weight_decay_conf: {}
176
+ optim: adam
177
+ optim_conf:
178
+ lr: 0.0003
179
+ scheduler: warmuplr
180
+ scheduler_conf:
181
+ warmup_steps: 8000
182
+ token_list:
183
+ - <blank>
184
+ - <unk>
185
+ - S
186
+ - ▁THE
187
+ - ▁TO
188
+ - ▁A
189
+ - ▁AND
190
+ - T
191
+ - ▁I
192
+ - ''''
193
+ - ▁OF
194
+ - ▁THAT
195
+ - ▁IN
196
+ - ING
197
+ - D
198
+ - ▁YOU
199
+ - ▁WE
200
+ - E
201
+ - ▁IT
202
+ - N
203
+ - ED
204
+ - ▁IS
205
+ - R
206
+ - M
207
+ - P
208
+ - Y
209
+ - ▁FOR
210
+ - ER
211
+ - ▁THIS
212
+ - ▁WAS
213
+ - RE
214
+ - C
215
+ - G
216
+ - ▁SO
217
+ - A
218
+ - ▁BE
219
+ - ▁THEY
220
+ - ▁HAVE
221
+ - ▁ARE
222
+ - O
223
+ - ▁
224
+ - ▁ON
225
+ - ▁WITH
226
+ - LY
227
+ - ▁WHAT
228
+ - U
229
+ - IN
230
+ - AL
231
+ - ▁MY
232
+ - I
233
+ - ▁S
234
+ - ▁DO
235
+ - B
236
+ - ▁RE
237
+ - L
238
+ - ▁ME
239
+ - ▁CAN
240
+ - ▁BUT
241
+ - LE
242
+ - ▁ABOUT
243
+ - OR
244
+ - ▁NOT
245
+ - VE
246
+ - F
247
+ - AR
248
+ - RA
249
+ - ▁ALL
250
+ - ▁OUR
251
+ - ▁PEOPLE
252
+ - ▁AT
253
+ - ▁C
254
+ - ▁AS
255
+ - IC
256
+ - ▁OR
257
+ - ▁LIKE
258
+ - W
259
+ - LL
260
+ - K
261
+ - ▁AN
262
+ - ▁THERE
263
+ - ENT
264
+ - ▁ONE
265
+ - ES
266
+ - ▁HE
267
+ - RI
268
+ - 'ON'
269
+ - ▁P
270
+ - ▁IF
271
+ - ▁FROM
272
+ - ▁JUST
273
+ - ▁WHEN
274
+ - TH
275
+ - ▁YOUR
276
+ - ▁US
277
+ - CE
278
+ - ▁DE
279
+ - ION
280
+ - IT
281
+ - ▁KNOW
282
+ - ▁HOW
283
+ - ▁T
284
+ - ▁BECAUSE
285
+ - CH
286
+ - V
287
+ - ▁OUT
288
+ - ▁B
289
+ - ▁UP
290
+ - ▁E
291
+ - ▁F
292
+ - TE
293
+ - ▁HAD
294
+ - ▁CO
295
+ - LI
296
+ - ▁TIME
297
+ - ▁THEIR
298
+ - ▁MORE
299
+ - UR
300
+ - ▁WHO
301
+ - ▁GO
302
+ - EN
303
+ - ▁G
304
+ - ATION
305
+ - AN
306
+ - CK
307
+ - TER
308
+ - ▁SEE
309
+ - ▁WOULD
310
+ - ▁THESE
311
+ - ▁NO
312
+ - ▁THEM
313
+ - ▁BY
314
+ - ▁THINK
315
+ - ▁WERE
316
+ - IL
317
+ - ATE
318
+ - ▁GET
319
+ - ▁SE
320
+ - ▁VERY
321
+ - ▁GOING
322
+ - ▁EX
323
+ - ▁REALLY
324
+ - ITY
325
+ - ▁WAY
326
+ - ▁CON
327
+ - H
328
+ - RO
329
+ - ▁DON
330
+ - ▁NOW
331
+ - ▁W
332
+ - X
333
+ - NE
334
+ - GE
335
+ - ▁WILL
336
+ - ▁MAKE
337
+ - ▁WANT
338
+ - ▁OTHER
339
+ - ▁SOME
340
+ - LA
341
+ - ▁WORLD
342
+ - ▁ST
343
+ - ▁COULD
344
+ - TION
345
+ - ▁WORK
346
+ - MENT
347
+ - ▁SHE
348
+ - ▁NEED
349
+ - ▁PA
350
+ - LO
351
+ - OL
352
+ - ▁SAY
353
+ - ▁MO
354
+ - ▁BA
355
+ - IST
356
+ - ▁FA
357
+ - IR
358
+ - ▁MA
359
+ - ERS
360
+ - ▁HAS
361
+ - VER
362
+ - ▁PO
363
+ - IVE
364
+ - ▁PRO
365
+ - ▁LIFE
366
+ - ▁INTO
367
+ - ▁WHICH
368
+ - ▁THINGS
369
+ - ▁WHERE
370
+ - ND
371
+ - ▁LA
372
+ - MP
373
+ - ▁BEEN
374
+ - ▁SOMETHING
375
+ - MA
376
+ - ▁THOSE
377
+ - US
378
+ - ▁NEW
379
+ - ▁CH
380
+ - ▁RA
381
+ - ▁ACTUALLY
382
+ - ▁YEARS
383
+ - ▁EVEN
384
+ - ▁TAKE
385
+ - ▁LOOK
386
+ - UL
387
+ - ▁RIGHT
388
+ - ▁SAID
389
+ - TIC
390
+ - ▁UN
391
+ - Z
392
+ - AS
393
+ - ▁DAY
394
+ - ▁HER
395
+ - IDE
396
+ - ▁BO
397
+ - ▁THAN
398
+ - ▁HERE
399
+ - ▁OVER
400
+ - ▁BACK
401
+ - ▁LO
402
+ - ▁FIRST
403
+ - ▁DI
404
+ - ▁MOST
405
+ - ▁COME
406
+ - ▁ALSO
407
+ - VI
408
+ - KE
409
+ - ▁WELL
410
+ - IES
411
+ - ABLE
412
+ - UT
413
+ - ▁THEN
414
+ - ▁CHANGE
415
+ - AGE
416
+ - ▁MUCH
417
+ - '0'
418
+ - ▁MEAN
419
+ - OM
420
+ - ▁CA
421
+ - CO
422
+ - AT
423
+ - ▁ANY
424
+ - ▁HAPPEN
425
+ - ▁ONLY
426
+ - ▁PART
427
+ - ▁SU
428
+ - ▁HIS
429
+ - ▁SP
430
+ - ▁DIS
431
+ - ANCE
432
+ - ID
433
+ - ▁MANY
434
+ - ▁RO
435
+ - '}'
436
+ - ▁{
437
+ - OW
438
+ - ▁O
439
+ - IGHT
440
+ - ▁GOOD
441
+ - UM
442
+ - ▁LIVE
443
+ - ▁LOT
444
+ - ▁D
445
+ - ▁TWO
446
+ - ▁LI
447
+ - ▁THING
448
+ - ▁GOT
449
+ - ▁TELL
450
+ - AC
451
+ - ▁EVERY
452
+ - EL
453
+ - CI
454
+ - ▁WHY
455
+ - TA
456
+ - FUL
457
+ - ▁BEING
458
+ - ANT
459
+ - EST
460
+ - ▁LEARN
461
+ - ▁COMP
462
+ - ▁DID
463
+ - URE
464
+ - PE
465
+ - ▁FEEL
466
+ - ▁DIFFERENT
467
+ - ▁PRE
468
+ - MO
469
+ - TI
470
+ - ▁HO
471
+ - ▁K
472
+ - ▁LITTLE
473
+ - IV
474
+ - ▁THROUGH
475
+ - ▁1
476
+ - INE
477
+ - ▁KIND
478
+ - ME
479
+ - RY
480
+ - ▁LET
481
+ - ▁HELP
482
+ - UN
483
+ - ICAL
484
+ - ▁VI
485
+ - ▁SAME
486
+ - ECT
487
+ - ▁HUMAN
488
+ - ▁GIVE
489
+ - HE
490
+ - ▁TALK
491
+ - ▁FE
492
+ - ▁HA
493
+ - ▁OWN
494
+ - ▁AROUND
495
+ - ▁USE
496
+ - IS
497
+ - ALLY
498
+ - ▁IDEA
499
+ - RESS
500
+ - ▁PROBLEM
501
+ - ▁PERSON
502
+ - ▁TE
503
+ - ▁FI
504
+ - ▁FIND
505
+ - ▁SA
506
+ - ▁START
507
+ - OS
508
+ - TED
509
+ - ▁BU
510
+ - LG
511
+ - NCE
512
+ - ATED
513
+ - ▁YEAR
514
+ - ▁DIDN
515
+ - ▁LOVE
516
+ - HO
517
+ - '5'
518
+ - ▁DOWN
519
+ - ▁SCHOOL
520
+ - ▁TODAY
521
+ - ▁QUESTION
522
+ - ▁HEAR
523
+ - DI
524
+ - ▁MAN
525
+ - ▁CAR
526
+ - MI
527
+ - ▁GREAT
528
+ - ▁CR
529
+ - ▁DOING
530
+ - IG
531
+ - ▁FACT
532
+ - ▁LE
533
+ - ▁LONG
534
+ - OUS
535
+ - ▁RU
536
+ - ▁PUT
537
+ - ▁AFTER
538
+ - ▁EN
539
+ - ▁M
540
+ - ▁GA
541
+ - ▁SHOW
542
+ - OP
543
+ - ▁SI
544
+ - ▁SHOULD
545
+ - ▁NE
546
+ - ▁STA
547
+ - ▁NEVER
548
+ - ▁BIG
549
+ - NS
550
+ - ▁THOUGHT
551
+ - ISH
552
+ - ▁MIGHT
553
+ - ▁ACT
554
+ - ▁PLACE
555
+ - LU
556
+ - END
557
+ - IZE
558
+ - ▁REAL
559
+ - ▁BETTER
560
+ - ATIVE
561
+ - IA
562
+ - ▁UNDERSTAND
563
+ - ▁POWER
564
+ - ▁IMPORTANT
565
+ - IAN
566
+ - ▁BRAIN
567
+ - ▁SYSTEM
568
+ - UAL
569
+ - NESS
570
+ - ▁END
571
+ - ▁ABLE
572
+ - ▁BEFORE
573
+ - ▁STORY
574
+ - ▁OFF
575
+ - TOR
576
+ - FF
577
+ - ▁STARTED
578
+ - ▁DR
579
+ - ▁MADE
580
+ - ▁ASK
581
+ - NA
582
+ - ▁HU
583
+ - ▁CREATE
584
+ - ATING
585
+ - ▁BI
586
+ - ARY
587
+ - ▁HIGH
588
+ - ▁HIM
589
+ - BO
590
+ - ITION
591
+ - ▁THREE
592
+ - ▁PER
593
+ - ▁AM
594
+ - ▁CALLED
595
+ - ▁APP
596
+ - ▁CAME
597
+ - ▁WOMEN
598
+ - ▁OLD
599
+ - TY
600
+ - ▁PLAY
601
+ - '4'
602
+ - PP
603
+ - ▁PH
604
+ - AG
605
+ - ▁BELIEVE
606
+ - ▁HOME
607
+ - ARD
608
+ - ▁FRIEND
609
+ - ▁RI
610
+ - ▁FOUND
611
+ - HA
612
+ - ▁HAND
613
+ - ▁DA
614
+ - ▁STILL
615
+ - ▁NA
616
+ - ▁WORD
617
+ - ▁TRANS
618
+ - ▁HEALTH
619
+ - OUND
620
+ - ▁BUILD
621
+ - ▁CARE
622
+ - ▁WI
623
+ - ▁NEXT
624
+ - ▁THANK
625
+ - ▁TURN
626
+ - ▁TOGETHER
627
+ - ▁TA
628
+ - ▁BECOME
629
+ - ▁EXPERIENCE
630
+ - VING
631
+ - ▁EM
632
+ - ▁MEN
633
+ - ISE
634
+ - ▁MAR
635
+ - ▁EACH
636
+ - ▁WENT
637
+ - ▁TRI
638
+ - ▁POINT
639
+ - ▁LAST
640
+ - ▁MAYBE
641
+ - ▁EVER
642
+ - ▁CALL
643
+ - WARD
644
+ - ▁CHILDREN
645
+ - ▁DOES
646
+ - CA
647
+ - ▁BIT
648
+ - UC
649
+ - LIC
650
+ - UGH
651
+ - ▁EXAMPLE
652
+ - ▁FEW
653
+ - ITIES
654
+ - ▁ANOTHER
655
+ - SH
656
+ - ▁TH
657
+ - ▁ALWAYS
658
+ - ▁H
659
+ - ▁READ
660
+ - ▁INTEREST
661
+ - FORM
662
+ - ▁STATE
663
+ - ▁MOVE
664
+ - IOUS
665
+ - ▁MIND
666
+ - 'NO'
667
+ - AM
668
+ - ▁TEACH
669
+ - ▁2
670
+ - ▁HARD
671
+ - ▁WANTED
672
+ - ▁20
673
+ - ▁GROW
674
+ - ▁JOB
675
+ - DA
676
+ - ▁TOO
677
+ - ▁VA
678
+ - OME
679
+ - ▁MAY
680
+ - '8'
681
+ - ▁SOCIAL
682
+ - ▁HI
683
+ - ▁FOOD
684
+ - BI
685
+ - ▁JO
686
+ - ▁COURSE
687
+ - ▁FR
688
+ - BA
689
+ - ▁MOMENT
690
+ - ▁AGAIN
691
+ - ▁DOESN
692
+ - ▁SHARE
693
+ - ▁AWAY
694
+ - ▁BETWEEN
695
+ - ▁LESS
696
+ - ▁SHA
697
+ - ▁MONEY
698
+ - ▁UNDER
699
+ - BER
700
+ - ▁DEVELOP
701
+ - ▁SECOND
702
+ - ▁NUMBER
703
+ - ▁ART
704
+ - QUE
705
+ - ▁FAMILY
706
+ - '1'
707
+ - '7'
708
+ - ▁SH
709
+ - '6'
710
+ - ▁EVERYTHING
711
+ - ▁FAR
712
+ - ▁WORKING
713
+ - ▁KIDS
714
+ - ▁PLAN
715
+ - ▁CHA
716
+ - ▁AGO
717
+ - ▁PI
718
+ - ▁ENOUGH
719
+ - ISM
720
+ - ▁AMERICA
721
+ - ▁THINKING
722
+ - ▁USED
723
+ - ▁REASON
724
+ - ▁TRY
725
+ - ▁SOMEONE
726
+ - ▁GENE
727
+ - ▁CU
728
+ - ▁STUDENT
729
+ - ▁TOLD
730
+ - ▁GU
731
+ - ▁TRYING
732
+ - ▁LEAD
733
+ - ▁MYSELF
734
+ - ▁BEST
735
+ - ▁FUTURE
736
+ - ▁MILLION
737
+ - ▁SMALL
738
+ - ▁TECHNOLOGY
739
+ - LESS
740
+ - ▁PASS
741
+ - ▁DONE
742
+ - ▁YOUNG
743
+ - '9'
744
+ - ▁SPACE
745
+ - ▁WATER
746
+ - ▁MATTER
747
+ - ▁OPEN
748
+ - ▁COUNTRY
749
+ - ▁REMEMBER
750
+ - ▁TALKING
751
+ - ▁REALIZE
752
+ - LAND
753
+ - ▁RESEARCH
754
+ - Q
755
+ - IAL
756
+ - ▁WAR
757
+ - ▁GROUP
758
+ - ▁BOOK
759
+ - ▁KEEP
760
+ - ▁DEF
761
+ - ▁STOP
762
+ - ▁HOPE
763
+ - ▁CONNECT
764
+ - ▁SENSE
765
+ - ▁ANSWER
766
+ - ▁WALK
767
+ - ▁DESIGN
768
+ - ▁WEEK
769
+ - ▁LANGUAGE
770
+ - ▁DATA
771
+ - ▁LOOKING
772
+ - ▁PERCENT
773
+ - ADE
774
+ - ▁CLASS
775
+ - ▁WHOLE
776
+ - ▁BODY
777
+ - ▁FOUR
778
+ - ▁OFTEN
779
+ - ▁ELSE
780
+ - ▁WITHOUT
781
+ - ▁PROCESS
782
+ - ▁FREE
783
+ - ▁MAKING
784
+ - IBLE
785
+ - ▁BRING
786
+ - ▁CHILD
787
+ - ▁GETTING
788
+ - ▁PROBABLY
789
+ - ▁ALLOW
790
+ - ▁SPEAK
791
+ - ▁COMMUNITY
792
+ - ▁HAVING
793
+ - ▁TOOK
794
+ - ▁OP
795
+ - ▁JU
796
+ - ▁MU
797
+ - ▁FACE
798
+ - ▁INFORMATION
799
+ - ABILITY
800
+ - ▁NAME
801
+ - ▁NI
802
+ - '2'
803
+ - ▁GIRL
804
+ - ▁CELL
805
+ - ▁ANYTHING
806
+ - ▁SCIENCE
807
+ - ▁STAND
808
+ - ▁WHILE
809
+ - ▁SUCH
810
+ - '000'
811
+ - ▁CASE
812
+ - J
813
+ - ANG
814
+ - ▁FIVE
815
+ - ▁GUY
816
+ - ▁FUN
817
+ - ▁BUSINESS
818
+ - ▁ROOM
819
+ - ▁SELF
820
+ - ▁LIVING
821
+ - ▁SURE
822
+ - ▁IMAGINE
823
+ - ▁ASKED
824
+ - ▁MIS
825
+ - ▁ENERGY
826
+ - ▁PROJECT
827
+ - ▁STUDY
828
+ - ▁DREAM
829
+ - ▁10
830
+ - ▁STORIES
831
+ - ▁ALREADY
832
+ - ▁TERM
833
+ - ▁EFFECT
834
+ - ▁KNEW
835
+ - ▁SOCIETY
836
+ - ▁PRODUCT
837
+ - ▁PRETTY
838
+ - ▁EVERYONE
839
+ - ▁HEAD
840
+ - ▁19
841
+ - ▁JA
842
+ - ▁LIGHT
843
+ - ▁LISTEN
844
+ - ▁MUSIC
845
+ - ▁LARGE
846
+ - ▁QUITE
847
+ - ▁J
848
+ - ▁BOTH
849
+ - ▁CHALLENGE
850
+ - ▁SORT
851
+ - ▁FELT
852
+ - ▁TREAT
853
+ - ▁EDUCATION
854
+ - ▁WRONG
855
+ - ▁YOURSELF
856
+ - ▁MIL
857
+ - ▁OURSELVES
858
+ - ▁SOUND
859
+ - ▁PROGRAM
860
+ - ▁3
861
+ - ▁CLOSE
862
+ - ▁QUA
863
+ - ▁SINGLE
864
+ - ▁MINUTE
865
+ - ▁NOTHING
866
+ - ▁ENVIRONMENT
867
+ - ▁PUBLIC
868
+ - ▁ORDER
869
+ - ▁OB
870
+ - ▁TRUE
871
+ - ▁STEP
872
+ - ▁WONDER
873
+ - ▁NIGHT
874
+ - ▁YET
875
+ - ▁EYE
876
+ - ▁LEFT
877
+ - SHIP
878
+ - ▁VALUE
879
+ - ▁WHETHER
880
+ - ▁MOTHER
881
+ - ▁SIMPLE
882
+ - ▁NU
883
+ - ▁WOMAN
884
+ - ▁LU
885
+ - ▁CONTROL
886
+ - ▁COMING
887
+ - ▁SAW
888
+ - ▁LEVEL
889
+ - ▁TEST
890
+ - ▁POSSIBLE
891
+ - ▁ACROSS
892
+ - ▁HOUSE
893
+ - ▁WATCH
894
+ - ▁GOVERNMENT
895
+ - ▁PARENTS
896
+ - ▁HALF
897
+ - ▁TEN
898
+ - ▁DEEP
899
+ - ▁CANCER
900
+ - ▁ISSUE
901
+ - ▁LATER
902
+ - ▁SOMETIMES
903
+ - ▁ANIMAL
904
+ - ▁SUPPORT
905
+ - ▁EAT
906
+ - ▁CULTURE
907
+ - ▁FULL
908
+ - ▁INSTEAD
909
+ - ▁EARTH
910
+ - ▁DISEASE
911
+ - ▁MIN
912
+ - ▁GAME
913
+ - ▁DECIDED
914
+ - ▁ALMOST
915
+ - ▁SUCCESS
916
+ - ▁AMAZING
917
+ - ▁DRIVE
918
+ - ▁DU
919
+ - ▁EMOTION
920
+ - ▁GLOBAL
921
+ - ▁EQU
922
+ - ▁PLANET
923
+ - ▁CERTAIN
924
+ - ▁HISTORY
925
+ - ▁MEET
926
+ - ▁TRAIN
927
+ - ▁COMPUTER
928
+ - ▁BECAME
929
+ - ▁TEAM
930
+ - ▁DISCOVER
931
+ - ▁DIFFERENCE
932
+ - WAY
933
+ - ▁FOCUS
934
+ - ▁PAST
935
+ - ▁RESULT
936
+ - ▁MONTHS
937
+ - ▁MODEL
938
+ - ▁YES
939
+ - ▁VO
940
+ - ▁COUNTRIES
941
+ - ▁STUFF
942
+ - ▁FIGURE
943
+ - ▁30
944
+ - ▁PATIENT
945
+ - ▁SPEND
946
+ - ▁ENTIRE
947
+ - ▁INDIVIDUAL
948
+ - ▁UNTIL
949
+ - ▁THOUGH
950
+ - ▁DECISION
951
+ - ▁CHOICE
952
+ - ▁AFRICA
953
+ - ▁RELATIONSHIP
954
+ - ▁BREAK
955
+ - ▁SOMEBODY
956
+ - ▁FOLLOW
957
+ - ▁CONVERSATION
958
+ - ▁LEAVE
959
+ - ▁THOUSAND
960
+ - ▁SIGN
961
+ - ▁SINCE
962
+ - ▁DIFFICULT
963
+ - ▁IMPACT
964
+ - ▁HOURS
965
+ - ▁COUPLE
966
+ - ▁CAUSE
967
+ - ▁PARTICULAR
968
+ - ▁DOCTOR
969
+ - ▁TAKING
970
+ - ▁COMPANY
971
+ - ▁EVERYBODY
972
+ - ▁50
973
+ - ▁DIRECT
974
+ - ▁EXPECT
975
+ - ▁200
976
+ - ▁ORGAN
977
+ - ▁EXACTLY
978
+ - ▁THEMSELVES
979
+ - ▁HAPPY
980
+ - ▁MUST
981
+ - ▁SAFE
982
+ - ▁BASED
983
+ - ▁BEAUTIFUL
984
+ - ▁PHONE
985
+ - ▁AGAINST
986
+ - ▁WRITE
987
+ - ▁DRUG
988
+ - ▁PICTURE
989
+ - ▁MEDIA
990
+ - ▁WAIT
991
+ - ▁FRONT
992
+ - ▁RISK
993
+ - ▁BEHAVIOR
994
+ - ▁BLACK
995
+ - ▁100
996
+ - ▁NATURE
997
+ - ▁ORGANIZATION
998
+ - ▁HUNDRED
999
+ - ▁EASY
1000
+ - ▁ACCESS
1001
+ - ▁HOLD
1002
+ - ▁COMMON
1003
+ - ▁MARKET
1004
+ - ▁GRAND
1005
+ - ▁VOICE
1006
+ - ▁DEATH
1007
+ - ▁PIECE
1008
+ - ▁BILLION
1009
+ - ▁LEAST
1010
+ - ▁DURING
1011
+ - '3'
1012
+ - ▁NATURAL
1013
+ - ▁TYPE
1014
+ - ▁INVEST
1015
+ - ▁GENERATION
1016
+ - ENCY
1017
+ - ▁STRONG
1018
+ - OLOGICAL
1019
+ - ▁CLEAR
1020
+ - ▁PRESENT
1021
+ - ▁INTERNET
1022
+ - ▁KILL
1023
+ - OLOGY
1024
+ - ▁SUPER
1025
+ - ▁UNITED
1026
+ - ▁IMAGE
1027
+ - ▁RATHER
1028
+ - ▁SOLUTION
1029
+ - ▁ECONOMIC
1030
+ - ▁PROTECT
1031
+ - ▁BEHIND
1032
+ - ▁COLLECT
1033
+ - ▁SCIENTIST
1034
+ - UDE
1035
+ - ▁PRODUCE
1036
+ - ▁PERFECT
1037
+ - ▁DOLLARS
1038
+ - ▁VIEW
1039
+ - ▁CONSIDER
1040
+ - ▁THIRD
1041
+ - ▁MACHINE
1042
+ - ▁OUTSIDE
1043
+ - ▁SKILL
1044
+ - ▁EXPERIMENT
1045
+ - ▁COLLEGE
1046
+ - ▁QUI
1047
+ - ▁OPPORTUNITY
1048
+ - ▁LOCAL
1049
+ - ▁SIMPLY
1050
+ - ▁EARLY
1051
+ - ▁MAJOR
1052
+ - ▁CANNOT
1053
+ - ▁PHYSICAL
1054
+ - ▁WHATEVER
1055
+ - ▁MIDDLE
1056
+ - ▁VIDEO
1057
+ - ▁ALONG
1058
+ - OGRAPH
1059
+ - ▁SOLVE
1060
+ - ▁KEY
1061
+ - ▁TRUST
1062
+ - ▁FIELD
1063
+ - HOOD
1064
+ - ▁ATTENTION
1065
+ - ▁MICRO
1066
+ - ▁SHORT
1067
+ - ▁SITUATION
1068
+ - ▁STREET
1069
+ - ▁COMPANIES
1070
+ - ▁POLITICAL
1071
+ - ▁NORMAL
1072
+ - ▁AMOUNT
1073
+ - ▁SERVICE
1074
+ - ▁OBJECT
1075
+ - ▁POTENTIAL
1076
+ - ▁COLOR
1077
+ - ▁KNOWLEDGE
1078
+ - ▁MORNING
1079
+ - ▁TRUTH
1080
+ - ▁UNIVERSITY
1081
+ - ▁PROVIDE
1082
+ - ▁RESOURCE
1083
+ - ▁POSITIVE
1084
+ - ▁EUROPE
1085
+ - ▁SPECIAL
1086
+ - ▁CONTINUE
1087
+ - ▁BASICALLY
1088
+ - ▁SMART
1089
+ - ▁PRACTICE
1090
+ - ▁POPULATION
1091
+ - ▁TRAVEL
1092
+ - ▁AFFECT
1093
+ - ▁FINALLY
1094
+ - ▁APPROACH
1095
+ - ▁COUNT
1096
+ - ▁PERHAPS
1097
+ - ▁INTERACT
1098
+ - ▁EXPLAIN
1099
+ - ▁ENGINEER
1100
+ - ▁ENGAGE
1101
+ - ▁SITTING
1102
+ - ▁OFFICE
1103
+ - ▁COMPLEX
1104
+ - ▁WHITE
1105
+ - ▁GENDER
1106
+ - ▁MESSAGE
1107
+ - ▁WORTH
1108
+ - ▁ITSELF
1109
+ - IZATION
1110
+ - ▁BUILT
1111
+ - ▁IMPROVE
1112
+ - ▁OKAY
1113
+ - ▁PRISON
1114
+ - ▁MATERIAL
1115
+ - ▁NETWORK
1116
+ - ▁EITHER
1117
+ - ▁GIVING
1118
+ - ▁LIMIT
1119
+ - ▁MEASURE
1120
+ - ▁DARK
1121
+ - ▁AUDIENCE
1122
+ - ▁ACCEPT
1123
+ - ▁RECORD
1124
+ - ▁OCEAN
1125
+ - ▁CHOOSE
1126
+ - ▁SPECIES
1127
+ - ▁YORK
1128
+ - ▁SUSTAIN
1129
+ - ▁SLEEP
1130
+ - ▁OBVIOUS
1131
+ - ▁HOSPITAL
1132
+ - ▁PERSPECTIVE
1133
+ - ▁INCREASE
1134
+ - ▁OPERA
1135
+ - ▁TAUGHT
1136
+ - ▁MULTI
1137
+ - ▁CHANGING
1138
+ - ▁JOURNEY
1139
+ - ▁INDUSTRY
1140
+ - ▁NEURO
1141
+ - ▁REQUIRE
1142
+ - ▁DECADE
1143
+ - ▁CURRENT
1144
+ - ▁PUSH
1145
+ - ▁BENEFIT
1146
+ - ▁YEAH
1147
+ - ▁BLOOD
1148
+ - ▁SCALE
1149
+ - ▁ESPECIALLY
1150
+ - ▁COMMUNITIES
1151
+ - ▁ADULT
1152
+ - ▁CHARACTER
1153
+ - ▁REPRESENT
1154
+ - IFIED
1155
+ - ▁SUFFER
1156
+ - ▁RECOGNIZE
1157
+ - ▁CENTURY
1158
+ - ▁SUDDEN
1159
+ - ▁FUNCTION
1160
+ - ▁ACHIEVE
1161
+ - ▁SIMILAR
1162
+ - ▁BROUGHT
1163
+ - ▁TRADITION
1164
+ - ▁UNIVERSE
1165
+ - ▁CLIMATE
1166
+ - ▁BREATH
1167
+ - ▁EXTREME
1168
+ - ▁REPORT
1169
+ - ▁DAUGHTER
1170
+ - ▁COMFORT
1171
+ - ▁CONCEPT
1172
+ - ▁ECONOMY
1173
+ - ▁INNOVATION
1174
+ - ▁QUICKLY
1175
+ - ▁SUGGEST
1176
+ - ▁SPECIFIC
1177
+ - ▁CRAZY
1178
+ - ▁CONSCIOUS
1179
+ - ▁SPREAD
1180
+ - ▁TRULY
1181
+ - '{'
1182
+ - <sos/eos>
1183
+ init: xavier_uniform
1184
+ input_size: 2048
1185
+ ctc_conf:
1186
+ dropout_rate: 0.0
1187
+ ctc_type: builtin
1188
+ reduce: true
1189
+ ignore_nan_grad: null
1190
+ zero_infinity: true
1191
+ joint_net_conf: null
1192
+ use_preprocessor: true
1193
+ token_type: bpe
1194
+ bpemodel: data/en_token_list/bpe_unigram1000/bpe.model
1195
+ non_linguistic_symbols: null
1196
+ cleaner: null
1197
+ g2p: null
1198
+ speech_volume_normalize: null
1199
+ rir_scp: null
1200
+ rir_apply_prob: 1.0
1201
+ noise_scp: null
1202
+ noise_apply_prob: 1.0
1203
+ noise_db_range: '13_15'
1204
+ short_noise_thres: 0.5
1205
+ aux_ctc_tasks: []
1206
+ frontend: null
1207
+ frontend_conf: {}
1208
+ specaug: null
1209
+ specaug_conf: {}
1210
+ normalize: global_mvn
1211
+ normalize_conf:
1212
+ stats_file: exp/asr_stats_extracted_en_bpe1000/train/feats_stats.npz
1213
+ model: espnet
1214
+ model_conf:
1215
+ ctc_weight: 0.3
1216
+ lsm_weight: 0.1
1217
+ length_normalized_loss: false
1218
+ preencoder: null
1219
+ preencoder_conf: {}
1220
+ encoder: avhubert
1221
+ encoder_conf:
1222
+ avhubert_url: https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/noise-pretrain/large_vox_iter5.pt
1223
+ avhubert_dir_path: ./local/pre-trained
1224
+ encoder_embed_dim: 1024
1225
+ encoder_attention_heads: 16
1226
+ encoder_ffn_embed_dim: 4096
1227
+ encoder_layers: 24
1228
+ dropout: 0.1
1229
+ dropout_features: 0.1
1230
+ encoder_layerdrop: 0.05
1231
+ attention_dropout: 0.1
1232
+ extracted: true
1233
+ freeze_finetune_updates: 10000
1234
+ feature_grad_mult: 1.0
1235
+ postencoder: null
1236
+ postencoder_conf: {}
1237
+ decoder: transformer
1238
+ decoder_conf:
1239
+ attention_heads: 4
1240
+ linear_units: 4096
1241
+ num_blocks: 6
1242
+ dropout_rate: 0.1
1243
+ positional_dropout_rate: 0.1
1244
+ self_attention_dropout_rate: 0.1
1245
+ src_attention_dropout_rate: 0.1
1246
+ preprocessor: default
1247
+ preprocessor_conf: {}
1248
+ required:
1249
+ - output_dir
1250
+ - token_list
1251
+ version: '202308'
1252
+ distributed: true
1253
+ ```
1254
+
1255
+ </details>
1256
+
1257
+
1258
+
1259
+ ### Citing ESPnet
1260
+
1261
+ ```BibTex
1262
+ @inproceedings{watanabe2018espnet,
1263
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1264
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
1265
+ year={2018},
1266
+ booktitle={Proceedings of Interspeech},
1267
+ pages={2207--2211},
1268
+ doi={10.21437/Interspeech.2018-1456},
1269
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
1270
+ }
1271
+
1272
+
1273
+
1274
+
1275
+
1276
+
1277
+ ```
1278
+
1279
+ or arXiv:
1280
+
1281
+ ```bibtex
1282
+ @misc{watanabe2018espnet,
1283
+ title={ESPnet: End-to-End Speech Processing Toolkit},
1284
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1285
+ year={2018},
1286
+ eprint={1804.00015},
1287
+ archivePrefix={arXiv},
1288
+ primaryClass={cs.CL}
1289
+ }
1290
+ ```
data/en_token_list/bpe_unigram1000/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f17273f292963ccee91e9975e1c500522bcd2a3db3438650223a35b30fd13e3e
3
+ size 253495
exp/asr_stats_extracted_en_bpe1000/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:762b9d682c1e29c737eb764b96ac26f5e604c50502ddde2918f7613751ac0709
3
+ size 17146
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Thu Sep 28 23:59:06 KST 2023`
5
+ - python version: `3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202308`
7
+ - pytorch version: `pytorch 1.12.0`
8
+ - Git hash: `5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352`
9
+ - Commit date: `Wed Aug 30 18:03:42 2023 -0400`
10
+
11
+ ## exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |inference_asr_model_valid.acc.ave/test|1321|9890|98.5|1.1|0.4|0.2|1.7|8.8|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |inference_asr_model_valid.acc.ave/test|1321|49750|99.4|0.2|0.4|0.2|0.8|8.8|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |inference_asr_model_valid.acc.ave/test|1321|14940|98.8|0.8|0.4|0.3|1.5|8.8|
29
+
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/config.yaml ADDED
@@ -0,0 +1,1187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_avsr_avhubert_large.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 54927
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 20
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - acc
41
+ - max
42
+ keep_nbest_models: 10
43
+ nbest_averaging_interval: 0
44
+ grad_clip: 5.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 1
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: false
52
+ log_interval: null
53
+ use_matplotlib: true
54
+ use_tensorboard: true
55
+ create_graph_in_tensorboard: false
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ pretrain_path: null
64
+ init_param: []
65
+ ignore_init_mismatch: false
66
+ freeze_param: []
67
+ num_iters_per_epoch: null
68
+ batch_size: 16
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp/asr_stats_extracted_en_bpe1000/train/speech_shape
74
+ - exp/asr_stats_extracted_en_bpe1000/train/text_shape.bpe
75
+ valid_shape_file:
76
+ - exp/asr_stats_extracted_en_bpe1000/valid/speech_shape
77
+ - exp/asr_stats_extracted_en_bpe1000/valid/text_shape.bpe
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 800
82
+ - 150
83
+ sort_in_batch: descending
84
+ shuffle_within_batch: false
85
+ sort_batch: descending
86
+ multiple_iterator: false
87
+ chunk_length: 500
88
+ chunk_shift_ratio: 0.5
89
+ num_cache_chunks: 1024
90
+ chunk_excluded_key_prefixes: []
91
+ train_data_path_and_name_and_type:
92
+ - - dump/extracted/train/feats.scp
93
+ - speech
94
+ - kaldi_ark
95
+ - - dump/extracted/train/text
96
+ - text
97
+ - text
98
+ valid_data_path_and_name_and_type:
99
+ - - dump/extracted/val/feats.scp
100
+ - speech
101
+ - kaldi_ark
102
+ - - dump/extracted/val/text
103
+ - text
104
+ - text
105
+ allow_variable_data_keys: false
106
+ max_cache_size: 0.0
107
+ max_cache_fd: 32
108
+ valid_max_cache_size: null
109
+ exclude_weight_decay: false
110
+ exclude_weight_decay_conf: {}
111
+ optim: adam
112
+ optim_conf:
113
+ lr: 0.0003
114
+ scheduler: warmuplr
115
+ scheduler_conf:
116
+ warmup_steps: 8000
117
+ token_list:
118
+ - <blank>
119
+ - <unk>
120
+ - S
121
+ - ▁THE
122
+ - ▁TO
123
+ - ▁A
124
+ - ▁AND
125
+ - T
126
+ - ▁I
127
+ - ''''
128
+ - ▁OF
129
+ - ▁THAT
130
+ - ▁IN
131
+ - ING
132
+ - D
133
+ - ▁YOU
134
+ - ▁WE
135
+ - E
136
+ - ▁IT
137
+ - N
138
+ - ED
139
+ - ▁IS
140
+ - R
141
+ - M
142
+ - P
143
+ - Y
144
+ - ▁FOR
145
+ - ER
146
+ - ▁THIS
147
+ - ▁WAS
148
+ - RE
149
+ - C
150
+ - G
151
+ - ▁SO
152
+ - A
153
+ - ▁BE
154
+ - ▁THEY
155
+ - ▁HAVE
156
+ - ▁ARE
157
+ - O
158
+ - ▁
159
+ - ▁ON
160
+ - ▁WITH
161
+ - LY
162
+ - ▁WHAT
163
+ - U
164
+ - IN
165
+ - AL
166
+ - ▁MY
167
+ - I
168
+ - ▁S
169
+ - ▁DO
170
+ - B
171
+ - ▁RE
172
+ - L
173
+ - ▁ME
174
+ - ▁CAN
175
+ - ▁BUT
176
+ - LE
177
+ - ▁ABOUT
178
+ - OR
179
+ - ▁NOT
180
+ - VE
181
+ - F
182
+ - AR
183
+ - RA
184
+ - ▁ALL
185
+ - ▁OUR
186
+ - ▁PEOPLE
187
+ - ▁AT
188
+ - ▁C
189
+ - ▁AS
190
+ - IC
191
+ - ▁OR
192
+ - ▁LIKE
193
+ - W
194
+ - LL
195
+ - K
196
+ - ▁AN
197
+ - ▁THERE
198
+ - ENT
199
+ - ▁ONE
200
+ - ES
201
+ - ▁HE
202
+ - RI
203
+ - 'ON'
204
+ - ▁P
205
+ - ▁IF
206
+ - ▁FROM
207
+ - ▁JUST
208
+ - ▁WHEN
209
+ - TH
210
+ - ▁YOUR
211
+ - ▁US
212
+ - CE
213
+ - ▁DE
214
+ - ION
215
+ - IT
216
+ - ▁KNOW
217
+ - ▁HOW
218
+ - ▁T
219
+ - ▁BECAUSE
220
+ - CH
221
+ - V
222
+ - ▁OUT
223
+ - ▁B
224
+ - ▁UP
225
+ - ▁E
226
+ - ▁F
227
+ - TE
228
+ - ▁HAD
229
+ - ▁CO
230
+ - LI
231
+ - ▁TIME
232
+ - ▁THEIR
233
+ - ▁MORE
234
+ - UR
235
+ - ▁WHO
236
+ - ▁GO
237
+ - EN
238
+ - ▁G
239
+ - ATION
240
+ - AN
241
+ - CK
242
+ - TER
243
+ - ▁SEE
244
+ - ▁WOULD
245
+ - ▁THESE
246
+ - ▁NO
247
+ - ▁THEM
248
+ - ▁BY
249
+ - ▁THINK
250
+ - ▁WERE
251
+ - IL
252
+ - ATE
253
+ - ▁GET
254
+ - ▁SE
255
+ - ▁VERY
256
+ - ▁GOING
257
+ - ▁EX
258
+ - ▁REALLY
259
+ - ITY
260
+ - ▁WAY
261
+ - ▁CON
262
+ - H
263
+ - RO
264
+ - ▁DON
265
+ - ▁NOW
266
+ - ▁W
267
+ - X
268
+ - NE
269
+ - GE
270
+ - ▁WILL
271
+ - ▁MAKE
272
+ - ▁WANT
273
+ - ▁OTHER
274
+ - ▁SOME
275
+ - LA
276
+ - ▁WORLD
277
+ - ▁ST
278
+ - ▁COULD
279
+ - TION
280
+ - ▁WORK
281
+ - MENT
282
+ - ▁SHE
283
+ - ▁NEED
284
+ - ▁PA
285
+ - LO
286
+ - OL
287
+ - ▁SAY
288
+ - ▁MO
289
+ - ▁BA
290
+ - IST
291
+ - ▁FA
292
+ - IR
293
+ - ▁MA
294
+ - ERS
295
+ - ▁HAS
296
+ - VER
297
+ - ▁PO
298
+ - IVE
299
+ - ▁PRO
300
+ - ▁LIFE
301
+ - ▁INTO
302
+ - ▁WHICH
303
+ - ▁THINGS
304
+ - ▁WHERE
305
+ - ND
306
+ - ▁LA
307
+ - MP
308
+ - ▁BEEN
309
+ - ▁SOMETHING
310
+ - MA
311
+ - ▁THOSE
312
+ - US
313
+ - ▁NEW
314
+ - ▁CH
315
+ - ▁RA
316
+ - ▁ACTUALLY
317
+ - ▁YEARS
318
+ - ▁EVEN
319
+ - ▁TAKE
320
+ - ▁LOOK
321
+ - UL
322
+ - ▁RIGHT
323
+ - ▁SAID
324
+ - TIC
325
+ - ▁UN
326
+ - Z
327
+ - AS
328
+ - ▁DAY
329
+ - ▁HER
330
+ - IDE
331
+ - ▁BO
332
+ - ▁THAN
333
+ - ▁HERE
334
+ - ▁OVER
335
+ - ▁BACK
336
+ - ▁LO
337
+ - ▁FIRST
338
+ - ▁DI
339
+ - ▁MOST
340
+ - ▁COME
341
+ - ▁ALSO
342
+ - VI
343
+ - KE
344
+ - ▁WELL
345
+ - IES
346
+ - ABLE
347
+ - UT
348
+ - ▁THEN
349
+ - ▁CHANGE
350
+ - AGE
351
+ - ▁MUCH
352
+ - '0'
353
+ - ▁MEAN
354
+ - OM
355
+ - ▁CA
356
+ - CO
357
+ - AT
358
+ - ▁ANY
359
+ - ▁HAPPEN
360
+ - ▁ONLY
361
+ - ▁PART
362
+ - ▁SU
363
+ - ▁HIS
364
+ - ▁SP
365
+ - ▁DIS
366
+ - ANCE
367
+ - ID
368
+ - ▁MANY
369
+ - ▁RO
370
+ - '}'
371
+ - ▁{
372
+ - OW
373
+ - ▁O
374
+ - IGHT
375
+ - ▁GOOD
376
+ - UM
377
+ - ▁LIVE
378
+ - ▁LOT
379
+ - ▁D
380
+ - ▁TWO
381
+ - ▁LI
382
+ - ▁THING
383
+ - ▁GOT
384
+ - ▁TELL
385
+ - AC
386
+ - ▁EVERY
387
+ - EL
388
+ - CI
389
+ - ▁WHY
390
+ - TA
391
+ - FUL
392
+ - ▁BEING
393
+ - ANT
394
+ - EST
395
+ - ▁LEARN
396
+ - ▁COMP
397
+ - ▁DID
398
+ - URE
399
+ - PE
400
+ - ▁FEEL
401
+ - ▁DIFFERENT
402
+ - ▁PRE
403
+ - MO
404
+ - TI
405
+ - ▁HO
406
+ - ▁K
407
+ - ▁LITTLE
408
+ - IV
409
+ - ▁THROUGH
410
+ - ▁1
411
+ - INE
412
+ - ▁KIND
413
+ - ME
414
+ - RY
415
+ - ▁LET
416
+ - ▁HELP
417
+ - UN
418
+ - ICAL
419
+ - ▁VI
420
+ - ▁SAME
421
+ - ECT
422
+ - ▁HUMAN
423
+ - ▁GIVE
424
+ - HE
425
+ - ▁TALK
426
+ - ▁FE
427
+ - ▁HA
428
+ - ▁OWN
429
+ - ▁AROUND
430
+ - ▁USE
431
+ - IS
432
+ - ALLY
433
+ - ▁IDEA
434
+ - RESS
435
+ - ▁PROBLEM
436
+ - ▁PERSON
437
+ - ▁TE
438
+ - ▁FI
439
+ - ▁FIND
440
+ - ▁SA
441
+ - ▁START
442
+ - OS
443
+ - TED
444
+ - ▁BU
445
+ - LG
446
+ - NCE
447
+ - ATED
448
+ - ▁YEAR
449
+ - ▁DIDN
450
+ - ▁LOVE
451
+ - HO
452
+ - '5'
453
+ - ▁DOWN
454
+ - ▁SCHOOL
455
+ - ▁TODAY
456
+ - ▁QUESTION
457
+ - ▁HEAR
458
+ - DI
459
+ - ▁MAN
460
+ - ▁CAR
461
+ - MI
462
+ - ▁GREAT
463
+ - ▁CR
464
+ - ▁DOING
465
+ - IG
466
+ - ▁FACT
467
+ - ▁LE
468
+ - ▁LONG
469
+ - OUS
470
+ - ▁RU
471
+ - ▁PUT
472
+ - ▁AFTER
473
+ - ▁EN
474
+ - ▁M
475
+ - ▁GA
476
+ - ▁SHOW
477
+ - OP
478
+ - ▁SI
479
+ - ▁SHOULD
480
+ - ▁NE
481
+ - ▁STA
482
+ - ▁NEVER
483
+ - ▁BIG
484
+ - NS
485
+ - ▁THOUGHT
486
+ - ISH
487
+ - ▁MIGHT
488
+ - ▁ACT
489
+ - ▁PLACE
490
+ - LU
491
+ - END
492
+ - IZE
493
+ - ▁REAL
494
+ - ▁BETTER
495
+ - ATIVE
496
+ - IA
497
+ - ▁UNDERSTAND
498
+ - ▁POWER
499
+ - ▁IMPORTANT
500
+ - IAN
501
+ - ▁BRAIN
502
+ - ▁SYSTEM
503
+ - UAL
504
+ - NESS
505
+ - ▁END
506
+ - ▁ABLE
507
+ - ▁BEFORE
508
+ - ▁STORY
509
+ - ▁OFF
510
+ - TOR
511
+ - FF
512
+ - ▁STARTED
513
+ - ▁DR
514
+ - ▁MADE
515
+ - ▁ASK
516
+ - NA
517
+ - ▁HU
518
+ - ▁CREATE
519
+ - ATING
520
+ - ▁BI
521
+ - ARY
522
+ - ▁HIGH
523
+ - ▁HIM
524
+ - BO
525
+ - ITION
526
+ - ▁THREE
527
+ - ▁PER
528
+ - ▁AM
529
+ - ▁CALLED
530
+ - ▁APP
531
+ - ▁CAME
532
+ - ▁WOMEN
533
+ - ▁OLD
534
+ - TY
535
+ - ▁PLAY
536
+ - '4'
537
+ - PP
538
+ - ▁PH
539
+ - AG
540
+ - ▁BELIEVE
541
+ - ▁HOME
542
+ - ARD
543
+ - ▁FRIEND
544
+ - ▁RI
545
+ - ▁FOUND
546
+ - HA
547
+ - ▁HAND
548
+ - ▁DA
549
+ - ▁STILL
550
+ - ▁NA
551
+ - ▁WORD
552
+ - ▁TRANS
553
+ - ▁HEALTH
554
+ - OUND
555
+ - ▁BUILD
556
+ - ▁CARE
557
+ - ▁WI
558
+ - ▁NEXT
559
+ - ▁THANK
560
+ - ▁TURN
561
+ - ▁TOGETHER
562
+ - ▁TA
563
+ - ▁BECOME
564
+ - ▁EXPERIENCE
565
+ - VING
566
+ - ▁EM
567
+ - ▁MEN
568
+ - ISE
569
+ - ▁MAR
570
+ - ▁EACH
571
+ - ▁WENT
572
+ - ▁TRI
573
+ - ▁POINT
574
+ - ▁LAST
575
+ - ▁MAYBE
576
+ - ▁EVER
577
+ - ▁CALL
578
+ - WARD
579
+ - ▁CHILDREN
580
+ - ▁DOES
581
+ - CA
582
+ - ▁BIT
583
+ - UC
584
+ - LIC
585
+ - UGH
586
+ - ▁EXAMPLE
587
+ - ▁FEW
588
+ - ITIES
589
+ - ▁ANOTHER
590
+ - SH
591
+ - ▁TH
592
+ - ▁ALWAYS
593
+ - ▁H
594
+ - ▁READ
595
+ - ▁INTEREST
596
+ - FORM
597
+ - ▁STATE
598
+ - ▁MOVE
599
+ - IOUS
600
+ - ▁MIND
601
+ - 'NO'
602
+ - AM
603
+ - ▁TEACH
604
+ - ▁2
605
+ - ▁HARD
606
+ - ▁WANTED
607
+ - ▁20
608
+ - ▁GROW
609
+ - ▁JOB
610
+ - DA
611
+ - ▁TOO
612
+ - ▁VA
613
+ - OME
614
+ - ▁MAY
615
+ - '8'
616
+ - ▁SOCIAL
617
+ - ▁HI
618
+ - ▁FOOD
619
+ - BI
620
+ - ▁JO
621
+ - ▁COURSE
622
+ - ▁FR
623
+ - BA
624
+ - ▁MOMENT
625
+ - ▁AGAIN
626
+ - ▁DOESN
627
+ - ▁SHARE
628
+ - ▁AWAY
629
+ - ▁BETWEEN
630
+ - ▁LESS
631
+ - ▁SHA
632
+ - ▁MONEY
633
+ - ▁UNDER
634
+ - BER
635
+ - ▁DEVELOP
636
+ - ▁SECOND
637
+ - ▁NUMBER
638
+ - ▁ART
639
+ - QUE
640
+ - ▁FAMILY
641
+ - '1'
642
+ - '7'
643
+ - ▁SH
644
+ - '6'
645
+ - ▁EVERYTHING
646
+ - ▁FAR
647
+ - ▁WORKING
648
+ - ▁KIDS
649
+ - ▁PLAN
650
+ - ▁CHA
651
+ - ▁AGO
652
+ - ▁PI
653
+ - ▁ENOUGH
654
+ - ISM
655
+ - ▁AMERICA
656
+ - ▁THINKING
657
+ - ▁USED
658
+ - ▁REASON
659
+ - ▁TRY
660
+ - ▁SOMEONE
661
+ - ▁GENE
662
+ - ▁CU
663
+ - ▁STUDENT
664
+ - ▁TOLD
665
+ - ▁GU
666
+ - ▁TRYING
667
+ - ▁LEAD
668
+ - ▁MYSELF
669
+ - ▁BEST
670
+ - ▁FUTURE
671
+ - ▁MILLION
672
+ - ▁SMALL
673
+ - ▁TECHNOLOGY
674
+ - LESS
675
+ - ▁PASS
676
+ - ▁DONE
677
+ - ▁YOUNG
678
+ - '9'
679
+ - ▁SPACE
680
+ - ▁WATER
681
+ - ▁MATTER
682
+ - ▁OPEN
683
+ - ▁COUNTRY
684
+ - ▁REMEMBER
685
+ - ▁TALKING
686
+ - ▁REALIZE
687
+ - LAND
688
+ - ▁RESEARCH
689
+ - Q
690
+ - IAL
691
+ - ▁WAR
692
+ - ▁GROUP
693
+ - ▁BOOK
694
+ - ▁KEEP
695
+ - ▁DEF
696
+ - ▁STOP
697
+ - ▁HOPE
698
+ - ▁CONNECT
699
+ - ▁SENSE
700
+ - ▁ANSWER
701
+ - ▁WALK
702
+ - ▁DESIGN
703
+ - ▁WEEK
704
+ - ▁LANGUAGE
705
+ - ▁DATA
706
+ - ▁LOOKING
707
+ - ▁PERCENT
708
+ - ADE
709
+ - ▁CLASS
710
+ - ▁WHOLE
711
+ - ▁BODY
712
+ - ▁FOUR
713
+ - ▁OFTEN
714
+ - ▁ELSE
715
+ - ▁WITHOUT
716
+ - ▁PROCESS
717
+ - ▁FREE
718
+ - ▁MAKING
719
+ - IBLE
720
+ - ▁BRING
721
+ - ▁CHILD
722
+ - ▁GETTING
723
+ - ▁PROBABLY
724
+ - ▁ALLOW
725
+ - ▁SPEAK
726
+ - ▁COMMUNITY
727
+ - ▁HAVING
728
+ - ▁TOOK
729
+ - ▁OP
730
+ - ▁JU
731
+ - ▁MU
732
+ - ▁FACE
733
+ - ▁INFORMATION
734
+ - ABILITY
735
+ - ▁NAME
736
+ - ▁NI
737
+ - '2'
738
+ - ▁GIRL
739
+ - ▁CELL
740
+ - ▁ANYTHING
741
+ - ▁SCIENCE
742
+ - ▁STAND
743
+ - ▁WHILE
744
+ - ▁SUCH
745
+ - '000'
746
+ - ▁CASE
747
+ - J
748
+ - ANG
749
+ - ▁FIVE
750
+ - ▁GUY
751
+ - ▁FUN
752
+ - ▁BUSINESS
753
+ - ▁ROOM
754
+ - ▁SELF
755
+ - ▁LIVING
756
+ - ▁SURE
757
+ - ▁IMAGINE
758
+ - ▁ASKED
759
+ - ▁MIS
760
+ - ▁ENERGY
761
+ - ▁PROJECT
762
+ - ▁STUDY
763
+ - ▁DREAM
764
+ - ▁10
765
+ - ▁STORIES
766
+ - ▁ALREADY
767
+ - ▁TERM
768
+ - ▁EFFECT
769
+ - ▁KNEW
770
+ - ▁SOCIETY
771
+ - ▁PRODUCT
772
+ - ▁PRETTY
773
+ - ▁EVERYONE
774
+ - ▁HEAD
775
+ - ▁19
776
+ - ▁JA
777
+ - ▁LIGHT
778
+ - ▁LISTEN
779
+ - ▁MUSIC
780
+ - ▁LARGE
781
+ - ▁QUITE
782
+ - ▁J
783
+ - ▁BOTH
784
+ - ▁CHALLENGE
785
+ - ▁SORT
786
+ - ▁FELT
787
+ - ▁TREAT
788
+ - ▁EDUCATION
789
+ - ▁WRONG
790
+ - ▁YOURSELF
791
+ - ▁MIL
792
+ - ▁OURSELVES
793
+ - ▁SOUND
794
+ - ▁PROGRAM
795
+ - ▁3
796
+ - ▁CLOSE
797
+ - ▁QUA
798
+ - ▁SINGLE
799
+ - ▁MINUTE
800
+ - ▁NOTHING
801
+ - ▁ENVIRONMENT
802
+ - ▁PUBLIC
803
+ - ▁ORDER
804
+ - ▁OB
805
+ - ▁TRUE
806
+ - ▁STEP
807
+ - ▁WONDER
808
+ - ▁NIGHT
809
+ - ▁YET
810
+ - ▁EYE
811
+ - ▁LEFT
812
+ - SHIP
813
+ - ▁VALUE
814
+ - ▁WHETHER
815
+ - ▁MOTHER
816
+ - ▁SIMPLE
817
+ - ▁NU
818
+ - ▁WOMAN
819
+ - ▁LU
820
+ - ▁CONTROL
821
+ - ▁COMING
822
+ - ▁SAW
823
+ - ▁LEVEL
824
+ - ▁TEST
825
+ - ▁POSSIBLE
826
+ - ▁ACROSS
827
+ - ▁HOUSE
828
+ - ▁WATCH
829
+ - ▁GOVERNMENT
830
+ - ▁PARENTS
831
+ - ▁HALF
832
+ - ▁TEN
833
+ - ▁DEEP
834
+ - ▁CANCER
835
+ - ▁ISSUE
836
+ - ▁LATER
837
+ - ▁SOMETIMES
838
+ - ▁ANIMAL
839
+ - ▁SUPPORT
840
+ - ▁EAT
841
+ - ▁CULTURE
842
+ - ▁FULL
843
+ - ▁INSTEAD
844
+ - ▁EARTH
845
+ - ▁DISEASE
846
+ - ▁MIN
847
+ - ▁GAME
848
+ - ▁DECIDED
849
+ - ▁ALMOST
850
+ - ▁SUCCESS
851
+ - ▁AMAZING
852
+ - ▁DRIVE
853
+ - ▁DU
854
+ - ▁EMOTION
855
+ - ▁GLOBAL
856
+ - ▁EQU
857
+ - ▁PLANET
858
+ - ▁CERTAIN
859
+ - ▁HISTORY
860
+ - ▁MEET
861
+ - ▁TRAIN
862
+ - ▁COMPUTER
863
+ - ▁BECAME
864
+ - ▁TEAM
865
+ - ▁DISCOVER
866
+ - ▁DIFFERENCE
867
+ - WAY
868
+ - ▁FOCUS
869
+ - ▁PAST
870
+ - ▁RESULT
871
+ - ▁MONTHS
872
+ - ▁MODEL
873
+ - ▁YES
874
+ - ▁VO
875
+ - ▁COUNTRIES
876
+ - ▁STUFF
877
+ - ▁FIGURE
878
+ - ▁30
879
+ - ▁PATIENT
880
+ - ▁SPEND
881
+ - ▁ENTIRE
882
+ - ▁INDIVIDUAL
883
+ - ▁UNTIL
884
+ - ▁THOUGH
885
+ - ▁DECISION
886
+ - ▁CHOICE
887
+ - ▁AFRICA
888
+ - ▁RELATIONSHIP
889
+ - ▁BREAK
890
+ - ▁SOMEBODY
891
+ - ▁FOLLOW
892
+ - ▁CONVERSATION
893
+ - ▁LEAVE
894
+ - ▁THOUSAND
895
+ - ▁SIGN
896
+ - ▁SINCE
897
+ - ▁DIFFICULT
898
+ - ▁IMPACT
899
+ - ▁HOURS
900
+ - ▁COUPLE
901
+ - ▁CAUSE
902
+ - ▁PARTICULAR
903
+ - ▁DOCTOR
904
+ - ▁TAKING
905
+ - ▁COMPANY
906
+ - ▁EVERYBODY
907
+ - ▁50
908
+ - ▁DIRECT
909
+ - ▁EXPECT
910
+ - ▁200
911
+ - ▁ORGAN
912
+ - ▁EXACTLY
913
+ - ▁THEMSELVES
914
+ - ▁HAPPY
915
+ - ▁MUST
916
+ - ▁SAFE
917
+ - ▁BASED
918
+ - ▁BEAUTIFUL
919
+ - ▁PHONE
920
+ - ▁AGAINST
921
+ - ▁WRITE
922
+ - ▁DRUG
923
+ - ▁PICTURE
924
+ - ▁MEDIA
925
+ - ▁WAIT
926
+ - ▁FRONT
927
+ - ▁RISK
928
+ - ▁BEHAVIOR
929
+ - ▁BLACK
930
+ - ▁100
931
+ - ▁NATURE
932
+ - ▁ORGANIZATION
933
+ - ▁HUNDRED
934
+ - ▁EASY
935
+ - ▁ACCESS
936
+ - ▁HOLD
937
+ - ▁COMMON
938
+ - ▁MARKET
939
+ - ▁GRAND
940
+ - ▁VOICE
941
+ - ▁DEATH
942
+ - ▁PIECE
943
+ - ▁BILLION
944
+ - ▁LEAST
945
+ - ▁DURING
946
+ - '3'
947
+ - ▁NATURAL
948
+ - ▁TYPE
949
+ - ▁INVEST
950
+ - ▁GENERATION
951
+ - ENCY
952
+ - ▁STRONG
953
+ - OLOGICAL
954
+ - ▁CLEAR
955
+ - ▁PRESENT
956
+ - ▁INTERNET
957
+ - ▁KILL
958
+ - OLOGY
959
+ - ▁SUPER
960
+ - ▁UNITED
961
+ - ▁IMAGE
962
+ - ▁RATHER
963
+ - ▁SOLUTION
964
+ - ▁ECONOMIC
965
+ - ▁PROTECT
966
+ - ▁BEHIND
967
+ - ▁COLLECT
968
+ - ▁SCIENTIST
969
+ - UDE
970
+ - ▁PRODUCE
971
+ - ▁PERFECT
972
+ - ▁DOLLARS
973
+ - ▁VIEW
974
+ - ▁CONSIDER
975
+ - ▁THIRD
976
+ - ▁MACHINE
977
+ - ▁OUTSIDE
978
+ - ▁SKILL
979
+ - ▁EXPERIMENT
980
+ - ▁COLLEGE
981
+ - ▁QUI
982
+ - ▁OPPORTUNITY
983
+ - ▁LOCAL
984
+ - ▁SIMPLY
985
+ - ▁EARLY
986
+ - ▁MAJOR
987
+ - ▁CANNOT
988
+ - ▁PHYSICAL
989
+ - ▁WHATEVER
990
+ - ▁MIDDLE
991
+ - ▁VIDEO
992
+ - ▁ALONG
993
+ - OGRAPH
994
+ - ▁SOLVE
995
+ - ▁KEY
996
+ - ▁TRUST
997
+ - ▁FIELD
998
+ - HOOD
999
+ - ▁ATTENTION
1000
+ - ▁MICRO
1001
+ - ▁SHORT
1002
+ - ▁SITUATION
1003
+ - ▁STREET
1004
+ - ▁COMPANIES
1005
+ - ▁POLITICAL
1006
+ - ▁NORMAL
1007
+ - ▁AMOUNT
1008
+ - ▁SERVICE
1009
+ - ▁OBJECT
1010
+ - ▁POTENTIAL
1011
+ - ▁COLOR
1012
+ - ▁KNOWLEDGE
1013
+ - ▁MORNING
1014
+ - ▁TRUTH
1015
+ - ▁UNIVERSITY
1016
+ - ▁PROVIDE
1017
+ - ▁RESOURCE
1018
+ - ▁POSITIVE
1019
+ - ▁EUROPE
1020
+ - ▁SPECIAL
1021
+ - ▁CONTINUE
1022
+ - ▁BASICALLY
1023
+ - ▁SMART
1024
+ - ▁PRACTICE
1025
+ - ▁POPULATION
1026
+ - ▁TRAVEL
1027
+ - ▁AFFECT
1028
+ - ▁FINALLY
1029
+ - ▁APPROACH
1030
+ - ▁COUNT
1031
+ - ▁PERHAPS
1032
+ - ▁INTERACT
1033
+ - ▁EXPLAIN
1034
+ - ▁ENGINEER
1035
+ - ▁ENGAGE
1036
+ - ▁SITTING
1037
+ - ▁OFFICE
1038
+ - ▁COMPLEX
1039
+ - ▁WHITE
1040
+ - ▁GENDER
1041
+ - ▁MESSAGE
1042
+ - ▁WORTH
1043
+ - ▁ITSELF
1044
+ - IZATION
1045
+ - ▁BUILT
1046
+ - ▁IMPROVE
1047
+ - ▁OKAY
1048
+ - ▁PRISON
1049
+ - ▁MATERIAL
1050
+ - ▁NETWORK
1051
+ - ▁EITHER
1052
+ - ▁GIVING
1053
+ - ▁LIMIT
1054
+ - ▁MEASURE
1055
+ - ▁DARK
1056
+ - ▁AUDIENCE
1057
+ - ▁ACCEPT
1058
+ - ▁RECORD
1059
+ - ▁OCEAN
1060
+ - ▁CHOOSE
1061
+ - ▁SPECIES
1062
+ - ▁YORK
1063
+ - ▁SUSTAIN
1064
+ - ▁SLEEP
1065
+ - ▁OBVIOUS
1066
+ - ▁HOSPITAL
1067
+ - ▁PERSPECTIVE
1068
+ - ▁INCREASE
1069
+ - ▁OPERA
1070
+ - ▁TAUGHT
1071
+ - ▁MULTI
1072
+ - ▁CHANGING
1073
+ - ▁JOURNEY
1074
+ - ▁INDUSTRY
1075
+ - ▁NEURO
1076
+ - ▁REQUIRE
1077
+ - ▁DECADE
1078
+ - ▁CURRENT
1079
+ - ▁PUSH
1080
+ - ▁BENEFIT
1081
+ - ▁YEAH
1082
+ - ▁BLOOD
1083
+ - ▁SCALE
1084
+ - ▁ESPECIALLY
1085
+ - ▁COMMUNITIES
1086
+ - ▁ADULT
1087
+ - ▁CHARACTER
1088
+ - ▁REPRESENT
1089
+ - IFIED
1090
+ - ▁SUFFER
1091
+ - ▁RECOGNIZE
1092
+ - ▁CENTURY
1093
+ - ▁SUDDEN
1094
+ - ▁FUNCTION
1095
+ - ▁ACHIEVE
1096
+ - ▁SIMILAR
1097
+ - ▁BROUGHT
1098
+ - ▁TRADITION
1099
+ - ▁UNIVERSE
1100
+ - ▁CLIMATE
1101
+ - ▁BREATH
1102
+ - ▁EXTREME
1103
+ - ▁REPORT
1104
+ - ▁DAUGHTER
1105
+ - ▁COMFORT
1106
+ - ▁CONCEPT
1107
+ - ▁ECONOMY
1108
+ - ▁INNOVATION
1109
+ - ▁QUICKLY
1110
+ - ▁SUGGEST
1111
+ - ▁SPECIFIC
1112
+ - ▁CRAZY
1113
+ - ▁CONSCIOUS
1114
+ - ▁SPREAD
1115
+ - ▁TRULY
1116
+ - '{'
1117
+ - <sos/eos>
1118
+ init: xavier_uniform
1119
+ input_size: 2048
1120
+ ctc_conf:
1121
+ dropout_rate: 0.0
1122
+ ctc_type: builtin
1123
+ reduce: true
1124
+ ignore_nan_grad: null
1125
+ zero_infinity: true
1126
+ joint_net_conf: null
1127
+ use_preprocessor: true
1128
+ token_type: bpe
1129
+ bpemodel: data/en_token_list/bpe_unigram1000/bpe.model
1130
+ non_linguistic_symbols: null
1131
+ cleaner: null
1132
+ g2p: null
1133
+ speech_volume_normalize: null
1134
+ rir_scp: null
1135
+ rir_apply_prob: 1.0
1136
+ noise_scp: null
1137
+ noise_apply_prob: 1.0
1138
+ noise_db_range: '13_15'
1139
+ short_noise_thres: 0.5
1140
+ aux_ctc_tasks: []
1141
+ frontend: null
1142
+ frontend_conf: {}
1143
+ specaug: null
1144
+ specaug_conf: {}
1145
+ normalize: global_mvn
1146
+ normalize_conf:
1147
+ stats_file: exp/asr_stats_extracted_en_bpe1000/train/feats_stats.npz
1148
+ model: espnet
1149
+ model_conf:
1150
+ ctc_weight: 0.3
1151
+ lsm_weight: 0.1
1152
+ length_normalized_loss: false
1153
+ preencoder: null
1154
+ preencoder_conf: {}
1155
+ encoder: avhubert
1156
+ encoder_conf:
1157
+ avhubert_url: https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/noise-pretrain/large_vox_iter5.pt
1158
+ avhubert_dir_path: ./local/pre-trained
1159
+ encoder_embed_dim: 1024
1160
+ encoder_attention_heads: 16
1161
+ encoder_ffn_embed_dim: 4096
1162
+ encoder_layers: 24
1163
+ dropout: 0.1
1164
+ dropout_features: 0.1
1165
+ encoder_layerdrop: 0.05
1166
+ attention_dropout: 0.1
1167
+ extracted: true
1168
+ freeze_finetune_updates: 10000
1169
+ feature_grad_mult: 1.0
1170
+ postencoder: null
1171
+ postencoder_conf: {}
1172
+ decoder: transformer
1173
+ decoder_conf:
1174
+ attention_heads: 4
1175
+ linear_units: 4096
1176
+ num_blocks: 6
1177
+ dropout_rate: 0.1
1178
+ positional_dropout_rate: 0.1
1179
+ self_attention_dropout_rate: 0.1
1180
+ src_attention_dropout_rate: 0.1
1181
+ preprocessor: default
1182
+ preprocessor_conf: {}
1183
+ required:
1184
+ - output_dir
1185
+ - token_list
1186
+ version: '202308'
1187
+ distributed: true
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/acc.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/backward_time.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/cer.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/cer_ctc.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/clip.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/forward_time.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/grad_norm.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/iter_time.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_att.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_ctc.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/loss_scale.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/optim0_lr0.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/optim_step_time.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/train_time.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/images/wer.png ADDED
exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bab5023e170d2fa1bdce1b185374decd2f190b2d48646e2c8f32d35103ea2c89
3
+ size 1714241991
meta.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ espnet: '202308'
2
+ files:
3
+ asr_model_file: exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/valid.acc.ave_10best.pth
4
+ python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
5
+ torch: pytorch 1.12.0
6
+ yaml_files:
7
+ asr_train_config: exp/asr_train_avsr_avhubert_large_extracted_en_bpe1000/config.yaml