ftshijt commited on
Commit
be22ac8
1 Parent(s): db572d7

Update model

Browse files
README.md ADDED
@@ -0,0 +1,779 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: noinfo
7
+ datasets:
8
+ - puebla_nahuatl
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/ftshijt_espnet2_asr_puebla_nahuatl_transfer`
15
+
16
+ This model was trained by ftshijt using puebla_nahuatl recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+
23
+ pip install -e .
24
+ cd els/puebla_nahuatl/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/ftshijt_espnet2_asr_puebla_nahuatl_transfer
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Sun Nov 7 18:16:55 EST 2021`
32
+ - python version: `3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.4a1`
34
+ - pytorch version: `pytorch 1.9.0`
35
+ - Git hash: ``
36
+ - Commit date: ``
37
+
38
+ ## asr_train_asr_transformer_hubert_raw_bpe500_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|90532|77.0|17.0|6.0|3.6|26.6|74.0|
44
+
45
+ ### CER
46
+
47
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
48
+ |---|---|---|---|---|---|---|---|---|
49
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|590273|92.2|2.1|5.7|3.0|10.8|74.0|
50
+
51
+ ### TER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|242435|86.0|7.3|6.8|3.5|17.5|74.0|
56
+
57
+ ## ASR config
58
+
59
+ <details><summary>expand</summary>
60
+
61
+ ```
62
+ config: conf/tuning/train_asr_transformer_hubert.yaml
63
+ print_config: false
64
+ log_level: INFO
65
+ dry_run: false
66
+ iterator_type: sequence
67
+ output_dir: exp/asr_train_asr_transformer_hubert_raw_bpe500_sp
68
+ ngpu: 1
69
+ seed: 0
70
+ num_workers: 1
71
+ num_att_plot: 3
72
+ dist_backend: nccl
73
+ dist_init_method: env://
74
+ dist_world_size: null
75
+ dist_rank: null
76
+ local_rank: 0
77
+ dist_master_addr: null
78
+ dist_master_port: null
79
+ dist_launcher: null
80
+ multiprocessing_distributed: false
81
+ unused_parameters: false
82
+ sharded_ddp: false
83
+ cudnn_enabled: true
84
+ cudnn_benchmark: false
85
+ cudnn_deterministic: true
86
+ collect_stats: false
87
+ write_collected_feats: false
88
+ max_epoch: 100
89
+ patience: 15
90
+ val_scheduler_criterion:
91
+ - valid
92
+ - loss
93
+ early_stopping_criterion:
94
+ - valid
95
+ - loss
96
+ - min
97
+ best_model_criterion:
98
+ - - valid
99
+ - acc
100
+ - max
101
+ keep_nbest_models: 10
102
+ grad_clip: 5
103
+ grad_clip_type: 2.0
104
+ grad_noise: false
105
+ accum_grad: 2
106
+ no_forward_run: false
107
+ resume: true
108
+ train_dtype: float32
109
+ use_amp: false
110
+ log_interval: null
111
+ use_tensorboard: true
112
+ use_wandb: false
113
+ wandb_project: null
114
+ wandb_id: null
115
+ wandb_entity: null
116
+ wandb_name: null
117
+ wandb_model_log_interval: -1
118
+ detect_anomaly: false
119
+ pretrain_path: null
120
+ init_param: []
121
+ ignore_init_mismatch: false
122
+ freeze_param: []
123
+ num_iters_per_epoch: null
124
+ batch_size: 32
125
+ valid_batch_size: null
126
+ batch_bins: 1000000
127
+ valid_batch_bins: null
128
+ train_shape_file:
129
+ - exp/asr_stats_raw_bpe500_sp/train/speech_shape
130
+ - exp/asr_stats_raw_bpe500_sp/train/text_shape.bpe
131
+ valid_shape_file:
132
+ - exp/asr_stats_raw_bpe500_sp/valid/speech_shape
133
+ - exp/asr_stats_raw_bpe500_sp/valid/text_shape.bpe
134
+ batch_type: folded
135
+ valid_batch_type: null
136
+ fold_length:
137
+ - 80000
138
+ - 150
139
+ sort_in_batch: descending
140
+ sort_batch: descending
141
+ multiple_iterator: false
142
+ chunk_length: 500
143
+ chunk_shift_ratio: 0.5
144
+ num_cache_chunks: 1024
145
+ train_data_path_and_name_and_type:
146
+ - - /tmp/jiatong-150390.uytFFbyG/raw/train_sp/wav.scp
147
+ - speech
148
+ - kaldi_ark
149
+ - - /tmp/jiatong-150390.uytFFbyG/raw/train_sp/text
150
+ - text
151
+ - text
152
+ valid_data_path_and_name_and_type:
153
+ - - /tmp/jiatong-150390.uytFFbyG/raw/dev/wav.scp
154
+ - speech
155
+ - kaldi_ark
156
+ - - /tmp/jiatong-150390.uytFFbyG/raw/dev/text
157
+ - text
158
+ - text
159
+ allow_variable_data_keys: false
160
+ max_cache_size: 0.0
161
+ max_cache_fd: 32
162
+ valid_max_cache_size: null
163
+ optim: adam
164
+ optim_conf:
165
+ lr: 1.0
166
+ scheduler: noamlr
167
+ scheduler_conf:
168
+ warmup_steps: 25000
169
+ token_list:
170
+ - <blank>
171
+ - <unk>
172
+ - ':'
173
+ - N
174
+ - ▁A
175
+ - ▁WA
176
+ - ▁KE
177
+ - ▁YO
178
+ - ▁NE
179
+ - ▁SE
180
+ - H
181
+ - MO
182
+ - WA
183
+ - ''''
184
+ - ▁NO
185
+ - ▁I
186
+ - ▁N
187
+ - S
188
+ - ▁KI
189
+ - K
190
+ - ▁
191
+ - MAH
192
+ - KA
193
+ - TA
194
+ - L
195
+ - ▁POS
196
+ - PA
197
+ - ▁KA
198
+ - ▁TA
199
+ - ▁MO
200
+ - T
201
+ - ▁YEHWA
202
+ - I
203
+ - MEH
204
+ - ▁YA
205
+ - ▁DE
206
+ - MA
207
+ - A
208
+ - ▁TE
209
+ - TI
210
+ - TSI
211
+ - NI
212
+ - CHI
213
+ - ▁PERO
214
+ - KI
215
+ - LI
216
+ - TO
217
+ - WI
218
+ - ▁PARA
219
+ - KO
220
+ - E
221
+ - ▁O
222
+ - ▁IKA
223
+ - TE
224
+ - O
225
+ - W
226
+ - ▁NEH
227
+ - ▁NOCHI
228
+ - CH
229
+ - ▁TI
230
+ - ▁TIK
231
+ - LO
232
+ - ▁SAH
233
+ - ▁MAH
234
+ - NA
235
+ - LA
236
+ - ▁OMPA
237
+ - ▁IHKÓ
238
+ - YA
239
+ - ▁NI
240
+ - ▁PORQUE
241
+ - ▁MA
242
+ - YO
243
+ - ▁TEIN
244
+ - LIA
245
+ - ▁E
246
+ - MPA
247
+ - ▁NIKA
248
+ - X
249
+ - YAH
250
+ - ▁KWALTSI
251
+ - SA
252
+ - TSA
253
+ - ▁MOCHI
254
+ - ▁NIK
255
+ - ▁WE
256
+ - ▁TO
257
+ - TSÍ
258
+ - ▁SEMI
259
+ - ▁KITA
260
+ - WAK
261
+ - KWI
262
+ - MI
263
+ - ▁MM
264
+ - ▁XO
265
+ - ▁SEKI
266
+ - JÓ
267
+ - AH
268
+ - ▁KOMO
269
+ - R
270
+ - NE
271
+ - ▁OK
272
+ - ▁KWALI
273
+ - ▁CHI
274
+ - ▁YEH
275
+ - ▁NELI
276
+ - SE
277
+ - PO
278
+ - WAH
279
+ - PI
280
+ - ME
281
+ - KWA
282
+ - ▁PA
283
+ - ▁ONKAK
284
+ - KE
285
+ - ▁YE
286
+ - ▁T
287
+ - LTIK
288
+ - ▁TEHWA
289
+ - TAH
290
+ - ▁TIKI
291
+ - ▁QUE
292
+ - ▁NIKI
293
+ - PE
294
+ - ▁IWKI
295
+ - XI
296
+ - TOK
297
+ - ▁TAMAN
298
+ - ▁KO
299
+ - TSO
300
+ - LE
301
+ - RA
302
+ - SI
303
+ - WÍ
304
+ - MAN
305
+ - ▁TIMO
306
+ - 'NO'
307
+ - SO
308
+ - ▁MIAK
309
+ - U
310
+ - ▁TEH
311
+ - ▁KICHI
312
+ - ▁XA
313
+ - WE
314
+ - ▁KOW
315
+ - KEH
316
+ - NÍ
317
+ - LIK
318
+ - ▁ITECH
319
+ - TIH
320
+ - ▁PE
321
+ - ▁KIPIA
322
+ - ▁CUANDO
323
+ - ▁KWALTIA
324
+ - ▁HASTA
325
+ - LOWA
326
+ - ▁ENTÓ
327
+ - ▁NA
328
+ - XO
329
+ - RO
330
+ - TIA
331
+ - ▁NIKITA
332
+ - CHIHCHI
333
+ - ▁SEPA
334
+ - ▁MAHYÁ
335
+ - ▁PAHTI
336
+ - ▁K
337
+ - LIAH
338
+ - ▁SAYOH
339
+ - MATI
340
+ - ▁PI
341
+ - TS
342
+ - ▁MÁS
343
+ - XMATI
344
+ - KAH
345
+ - ▁XI
346
+ - M
347
+ - ▁ESTE
348
+ - HKO
349
+ - KOWIT
350
+ - MIKI
351
+ - CHO
352
+ - ▁TAK
353
+ - Á
354
+ - ▁KILIAH
355
+ - CHIO
356
+ - ▁KIHTOWA
357
+ - ▁KITE
358
+ - NEKI
359
+ - ▁ME
360
+ - XA
361
+ - ▁TEL
362
+ - B
363
+ - ▁KOWIT
364
+ - ▁ATA
365
+ - TIK
366
+ - ▁EKINTSI
367
+ - ▁IMA
368
+ - ▁KWA
369
+ - ▁OSO
370
+ - ▁NEHJÓ
371
+ - ▁ITEYO
372
+ - Y
373
+ - SKEH
374
+ - ▁ISTA
375
+ - ▁NIKILIA
376
+ - LIH
377
+ - ▁TIKWI
378
+ - ▁PANÉ
379
+ - KOWA
380
+ - ▁OX
381
+ - TEKI
382
+ - ▁SA
383
+ - NTE
384
+ - ▁KIKWI
385
+ - TSITSI
386
+ - NOH
387
+ - AHSI
388
+ - ▁IXO
389
+ - WIA
390
+ - LTSI
391
+ - ▁KIMA
392
+ - C
393
+ - ▁WEHWEI
394
+ - ▁TEPITSI
395
+ - ▁IHK
396
+ - ▁XIWIT
397
+ - YI
398
+ - LIS
399
+ - ▁CA
400
+ - XMATTOK
401
+ - SÁ
402
+ - ▁MOTA
403
+ - RE
404
+ - ▁TIKIHTO
405
+ - ▁MI
406
+ - ▁X
407
+ - D
408
+ - ▁SAN
409
+ - WIH
410
+ - ▁WEHKA
411
+ - KWE
412
+ - CHA
413
+ - ▁SI
414
+ - KTIK
415
+ - ▁YETOK
416
+ - ▁MOKA
417
+ - NEMI
418
+ - LILIA
419
+ - ▁¿
420
+ - TIW
421
+ - ▁KIHTOWAH
422
+ - LTI
423
+ - Ó
424
+ - MASÁ
425
+ - ▁POR
426
+ - ▁TIKITA
427
+ - KETSA
428
+ - ▁IWA
429
+ - METS
430
+ - YOH
431
+ - ▁TAKWA
432
+ - HKEH
433
+ - ▁KIKWIH
434
+ - ▁KIKWA
435
+ - NIA
436
+ - ▁ACHI
437
+ - ▁KIKWAH
438
+ - ▁KACHI
439
+ - ▁PO
440
+ - ▁IGUAL
441
+ - NAL
442
+ - ▁PILI
443
+ - ▁NIMAN
444
+ - YE
445
+ - ▁NIKMATI
446
+ - WIAH
447
+ - ▁KIPA
448
+ - ▁M
449
+ - J
450
+ - ▁KWI
451
+ - ▁WI
452
+ - WAYA
453
+ - Z
454
+ - ▁KITEKI
455
+ - G
456
+ - ▁'
457
+ - ▁IHKO
458
+ - CE
459
+ - ▁TONI
460
+ - ▁TSIKITSI
461
+ - P
462
+ - DO
463
+ - TOKEH
464
+ - NIK
465
+ - ▁TIKILIAH
466
+ - ▁KOWTAH
467
+ - ▁TAI
468
+ - ▁TATA
469
+ - TIAH
470
+ - CA
471
+ - PIL
472
+ - CHOWA
473
+ - ▁KIMATI
474
+ - ▁TAMA
475
+ - XKA
476
+ - XIWIT
477
+ - TOS
478
+ - KILIT
479
+ - ILWI
480
+ - SKI
481
+ - YEH
482
+ - DA
483
+ - WAYO
484
+ - ▁TAPA
485
+ - ▁NIMO
486
+ - CHIT
487
+ - ▁NIMITS
488
+ - ▁KINA
489
+ - PAHTI
490
+ - RI
491
+ - ▁BUENO
492
+ - ▁ESKI
493
+ - WAYAH
494
+ - PANO
495
+ - KOW
496
+ - WEYAK
497
+ - LPAN
498
+ - LTIA
499
+ - ▁KITO
500
+ - CO
501
+ - ▁TINE
502
+ - KIH
503
+ - JO
504
+ - ▁KATKA
505
+ - ▁TIKTA
506
+ - PAHTIA
507
+ - ▁XIWTSI
508
+ - ▁CHIKA
509
+ - ▁KANAH
510
+ - ▁KOYO
511
+ - MPI
512
+ - ▁IXIWYO
513
+ - IHTIK
514
+ - ▁KWE
515
+ - ▁XIW
516
+ - WILIA
517
+ - XTIK
518
+ - ▁VE
519
+ - ▁TIKMATI
520
+ - ▁KOKOLIS
521
+ - LKWI
522
+ - ▁AHKO
523
+ - MEKAT
524
+ - ▁TIKMA
525
+ - ▁NIMITSILIA
526
+ - ▁MITS
527
+ - XTA
528
+ - ▁CO
529
+ - ▁KOMA
530
+ - ▁KOMOHKÓ
531
+ - F
532
+ - ▁OKSEKI
533
+ - ▁TEISÁ
534
+ - ▁ESO
535
+ - ▁IKOWYO
536
+ - ▁ES
537
+ - TOHTO
538
+ - XTI
539
+ - ▁TSI
540
+ - ▁TIKO
541
+ - PIHPI
542
+ - ▁OKSÉ
543
+ - ▁WEHKAPAN
544
+ - KALAKI
545
+ - ▁WEL
546
+ - ▁MIGUEL
547
+ - TEKITI
548
+ - ▁TOKNI
549
+ - ROWA
550
+ - ▁MOSKALTIA
551
+ - Í
552
+ - XOKO
553
+ - ▁TIKCHI
554
+ - ▁EHE
555
+ - ▁KWO
556
+ - LPI
557
+ - HTOK
558
+ - TSTI
559
+ - TÍ
560
+ - ▁TEIHSÁ
561
+ - KILO
562
+ - ▁PUES
563
+ - SKIA
564
+ - HTIW
565
+ - LILIAH
566
+ - ▁IHWA
567
+ - ▁KOSTIK
568
+ - ▁TIKIHTOWAH
569
+ - ▁CHA
570
+ - ▁COMO
571
+ - ▁KIMANA
572
+ - CU
573
+ - TAMAN
574
+ - WITS
575
+ - ▁KOKO
576
+ - ILPIA
577
+ - ▁NIMONO
578
+ - ▁WELI
579
+ - ▁NIKWI
580
+ - WTOK
581
+ - ▁KINEKI
582
+ - KOKOH
583
+ - ▁P
584
+ - LTIAH
585
+ - XKO
586
+ - ▁ONKAYA
587
+ - TAPOWI
588
+ - MATTOK
589
+ - ▁MISMO
590
+ - ▁NIKIHTO
591
+ - ▁NIKMATTOK
592
+ - MESKIA
593
+ - ▁SOH
594
+ - KWOWIT
595
+ - XTIA
596
+ - WELITA
597
+ - ▁DESPUÉS
598
+ - ▁IXWA
599
+ - ZA
600
+ - TSAPOT
601
+ - SKAL
602
+ - ▁SIEMPRE
603
+ - TINEMI
604
+ - Ñ
605
+ - ▁ESKIA
606
+ - NELOWA
607
+ - ▁TZINACAPAN
608
+ - ▁DI
609
+ - XIWYO
610
+ - ▁AHA
611
+ - ▁AHWIA
612
+ - É
613
+ - ▁KIKWIAH
614
+ - MATTOKEH
615
+ - ▁ACHTO
616
+ - XTILIA
617
+ - TAPAL
618
+ - ▁KIHTO
619
+ - TEHTE
620
+ - ▁PORIN
621
+ - ▁TSOPE
622
+ - ▁KAHFE
623
+ - GU
624
+ - ▁NIMITSTAHTANI
625
+ - ▁TAHTA
626
+ - ▁KOWTATI
627
+ - ISWAT
628
+ - ▁TIKPIA
629
+ - ▁KOMEKAT
630
+ - TIOWIH
631
+ - ▁TIMONOHNO
632
+ - ▁TIEMPO
633
+ - WEHKA
634
+ - QUI
635
+ - ▁TIHTI
636
+ - ▁XOXOKTIK
637
+ - ▁TAXKAL
638
+ - EHE
639
+ - ▁AJÁ
640
+ - NANAKAT
641
+ - NIWKI
642
+ - ▁CI
643
+ - ▁ITSMOL
644
+ - ▁NIKPIA
645
+ - TEKPA
646
+ - ▁BO
647
+ - ▁TASOHKA
648
+ - Ú
649
+ - ¡
650
+ - '8'
651
+ - '9'
652
+ - '0'
653
+ - '1'
654
+ - '2'
655
+ - ¿
656
+ - Ò
657
+ - '4'
658
+ - À
659
+ - '7'
660
+ - '5'
661
+ - '3'
662
+ - ́
663
+ - V
664
+ - ̈
665
+ - Ï
666
+ - '6'
667
+ - Q
668
+ - Ì
669
+ - <sos/eos>
670
+ init: xavier_uniform
671
+ input_size: null
672
+ ctc_conf:
673
+ dropout_rate: 0.0
674
+ ctc_type: builtin
675
+ reduce: true
676
+ ignore_nan_grad: true
677
+ model_conf:
678
+ ctc_weight: 0.3
679
+ lsm_weight: 0.1
680
+ length_normalized_loss: false
681
+ extract_feats_in_collect_stats: false
682
+ use_preprocessor: true
683
+ token_type: bpe
684
+ bpemodel: data/token_list/bpe_unigram500/bpe.model
685
+ non_linguistic_symbols: null
686
+ cleaner: null
687
+ g2p: null
688
+ speech_volume_normalize: null
689
+ rir_scp: null
690
+ rir_apply_prob: 1.0
691
+ noise_scp: null
692
+ noise_apply_prob: 1.0
693
+ noise_db_range: '13_15'
694
+ frontend: s3prl
695
+ frontend_conf:
696
+ frontend_conf:
697
+ upstream: hubert_large_ll60k
698
+ download_dir: ./hub
699
+ multilayer_feature: true
700
+ fs: 16k
701
+ specaug: specaug
702
+ specaug_conf:
703
+ apply_time_warp: true
704
+ time_warp_window: 5
705
+ time_warp_mode: bicubic
706
+ apply_freq_mask: true
707
+ freq_mask_width_range:
708
+ - 0
709
+ - 30
710
+ num_freq_mask: 2
711
+ apply_time_mask: true
712
+ time_mask_width_range:
713
+ - 0
714
+ - 40
715
+ num_time_mask: 2
716
+ normalize: utterance_mvn
717
+ normalize_conf: {}
718
+ preencoder: linear
719
+ preencoder_conf:
720
+ input_size: 1024
721
+ output_size: 80
722
+ encoder: transformer
723
+ encoder_conf:
724
+ input_layer: conv2d
725
+ num_blocks: 12
726
+ linear_units: 2048
727
+ dropout_rate: 0.1
728
+ output_size: 256
729
+ attention_heads: 4
730
+ attention_dropout_rate: 0.0
731
+ postencoder: null
732
+ postencoder_conf: {}
733
+ decoder: transformer
734
+ decoder_conf:
735
+ input_layer: embed
736
+ num_blocks: 6
737
+ linear_units: 2048
738
+ dropout_rate: 0.1
739
+ required:
740
+ - output_dir
741
+ - token_list
742
+ version: 0.10.4a1
743
+ distributed: false
744
+ ```
745
+
746
+ </details>
747
+
748
+
749
+
750
+ ### Citing ESPnet
751
+
752
+ ```BibTex
753
+ @inproceedings{watanabe2018espnet,
754
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
755
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
756
+ year={2018},
757
+ booktitle={Proceedings of Interspeech},
758
+ pages={2207--2211},
759
+ doi={10.21437/Interspeech.2018-1456},
760
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
761
+ }
762
+
763
+
764
+
765
+
766
+ ```
767
+
768
+ or arXiv:
769
+
770
+ ```bibtex
771
+ @misc{watanabe2018espnet,
772
+ title={ESPnet: End-to-End Speech Processing Toolkit},
773
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
774
+ year={2018},
775
+ eprint={1804.00015},
776
+ archivePrefix={arXiv},
777
+ primaryClass={cs.CL}
778
+ }
779
+ ```
data/token_list/bpe_unigram500/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0893a480fca4e0819a0b05b7a73aaa67489c61ae65f7ad59042d00824ce865af
3
+ size 244843
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/45epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71688cbfa4549dd3dbf4d776c4241ba3b14cacaff916d6d852d9880147a7b02e
3
+ size 1376993057
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Nov 7 18:16:55 EST 2021`
5
+ - python version: `3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.4a1`
7
+ - pytorch version: `pytorch 1.9.0`
8
+ - Git hash: ``
9
+ - Commit date: ``
10
+
11
+ ## asr_train_asr_transformer_hubert_raw_bpe500_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|90532|77.0|17.0|6.0|3.6|26.6|74.0|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|590273|92.2|2.1|5.7|3.0|10.8|74.0|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |decode_asr_lm_lm_train_bpe500_valid.loss.ave_asr_model_valid.acc.best/test|10576|242435|86.0|7.3|6.8|3.5|17.5|74.0|
29
+
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/config.yaml ADDED
@@ -0,0 +1,682 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_transformer_hubert.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_transformer_hubert_raw_bpe500_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 15
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - acc
39
+ - max
40
+ keep_nbest_models: 10
41
+ grad_clip: 5
42
+ grad_clip_type: 2.0
43
+ grad_noise: false
44
+ accum_grad: 2
45
+ no_forward_run: false
46
+ resume: true
47
+ train_dtype: float32
48
+ use_amp: false
49
+ log_interval: null
50
+ use_tensorboard: true
51
+ use_wandb: false
52
+ wandb_project: null
53
+ wandb_id: null
54
+ wandb_entity: null
55
+ wandb_name: null
56
+ wandb_model_log_interval: -1
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param: []
60
+ ignore_init_mismatch: false
61
+ freeze_param: []
62
+ num_iters_per_epoch: null
63
+ batch_size: 32
64
+ valid_batch_size: null
65
+ batch_bins: 1000000
66
+ valid_batch_bins: null
67
+ train_shape_file:
68
+ - exp/asr_stats_raw_bpe500_sp/train/speech_shape
69
+ - exp/asr_stats_raw_bpe500_sp/train/text_shape.bpe
70
+ valid_shape_file:
71
+ - exp/asr_stats_raw_bpe500_sp/valid/speech_shape
72
+ - exp/asr_stats_raw_bpe500_sp/valid/text_shape.bpe
73
+ batch_type: folded
74
+ valid_batch_type: null
75
+ fold_length:
76
+ - 80000
77
+ - 150
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - /tmp/jiatong-150390.uytFFbyG/raw/train_sp/wav.scp
86
+ - speech
87
+ - kaldi_ark
88
+ - - /tmp/jiatong-150390.uytFFbyG/raw/train_sp/text
89
+ - text
90
+ - text
91
+ valid_data_path_and_name_and_type:
92
+ - - /tmp/jiatong-150390.uytFFbyG/raw/dev/wav.scp
93
+ - speech
94
+ - kaldi_ark
95
+ - - /tmp/jiatong-150390.uytFFbyG/raw/dev/text
96
+ - text
97
+ - text
98
+ allow_variable_data_keys: false
99
+ max_cache_size: 0.0
100
+ max_cache_fd: 32
101
+ valid_max_cache_size: null
102
+ optim: adam
103
+ optim_conf:
104
+ lr: 1.0
105
+ scheduler: noamlr
106
+ scheduler_conf:
107
+ warmup_steps: 25000
108
+ token_list:
109
+ - <blank>
110
+ - <unk>
111
+ - ':'
112
+ - N
113
+ - ▁A
114
+ - ▁WA
115
+ - ▁KE
116
+ - ▁YO
117
+ - ▁NE
118
+ - ▁SE
119
+ - H
120
+ - MO
121
+ - WA
122
+ - ''''
123
+ - ▁NO
124
+ - ▁I
125
+ - ▁N
126
+ - S
127
+ - ▁KI
128
+ - K
129
+ - ▁
130
+ - MAH
131
+ - KA
132
+ - TA
133
+ - L
134
+ - ▁POS
135
+ - PA
136
+ - ▁KA
137
+ - ▁TA
138
+ - ▁MO
139
+ - T
140
+ - ▁YEHWA
141
+ - I
142
+ - MEH
143
+ - ▁YA
144
+ - ▁DE
145
+ - MA
146
+ - A
147
+ - ▁TE
148
+ - TI
149
+ - TSI
150
+ - NI
151
+ - CHI
152
+ - ▁PERO
153
+ - KI
154
+ - LI
155
+ - TO
156
+ - WI
157
+ - ▁PARA
158
+ - KO
159
+ - E
160
+ - ▁O
161
+ - ▁IKA
162
+ - TE
163
+ - O
164
+ - W
165
+ - ▁NEH
166
+ - ▁NOCHI
167
+ - CH
168
+ - ▁TI
169
+ - ▁TIK
170
+ - LO
171
+ - ▁SAH
172
+ - ▁MAH
173
+ - NA
174
+ - LA
175
+ - ▁OMPA
176
+ - ▁IHKÓ
177
+ - YA
178
+ - ▁NI
179
+ - ▁PORQUE
180
+ - ▁MA
181
+ - YO
182
+ - ▁TEIN
183
+ - LIA
184
+ - ▁E
185
+ - MPA
186
+ - ▁NIKA
187
+ - X
188
+ - YAH
189
+ - ▁KWALTSI
190
+ - SA
191
+ - TSA
192
+ - ▁MOCHI
193
+ - ▁NIK
194
+ - ▁WE
195
+ - ▁TO
196
+ - TSÍ
197
+ - ▁SEMI
198
+ - ▁KITA
199
+ - WAK
200
+ - KWI
201
+ - MI
202
+ - ▁MM
203
+ - ▁XO
204
+ - ▁SEKI
205
+ - JÓ
206
+ - AH
207
+ - ▁KOMO
208
+ - R
209
+ - NE
210
+ - ▁OK
211
+ - ▁KWALI
212
+ - ▁CHI
213
+ - ▁YEH
214
+ - ▁NELI
215
+ - SE
216
+ - PO
217
+ - WAH
218
+ - PI
219
+ - ME
220
+ - KWA
221
+ - ▁PA
222
+ - ▁ONKAK
223
+ - KE
224
+ - ▁YE
225
+ - ▁T
226
+ - LTIK
227
+ - ▁TEHWA
228
+ - TAH
229
+ - ▁TIKI
230
+ - ▁QUE
231
+ - ▁NIKI
232
+ - PE
233
+ - ▁IWKI
234
+ - XI
235
+ - TOK
236
+ - ▁TAMAN
237
+ - ▁KO
238
+ - TSO
239
+ - LE
240
+ - RA
241
+ - SI
242
+ - WÍ
243
+ - MAN
244
+ - ▁TIMO
245
+ - 'NO'
246
+ - SO
247
+ - ▁MIAK
248
+ - U
249
+ - ▁TEH
250
+ - ▁KICHI
251
+ - ▁XA
252
+ - WE
253
+ - ▁KOW
254
+ - KEH
255
+ - NÍ
256
+ - LIK
257
+ - ▁ITECH
258
+ - TIH
259
+ - ▁PE
260
+ - ▁KIPIA
261
+ - ▁CUANDO
262
+ - ▁KWALTIA
263
+ - ▁HASTA
264
+ - LOWA
265
+ - ▁ENTÓ
266
+ - ▁NA
267
+ - XO
268
+ - RO
269
+ - TIA
270
+ - ▁NIKITA
271
+ - CHIHCHI
272
+ - ▁SEPA
273
+ - ▁MAHYÁ
274
+ - ▁PAHTI
275
+ - ▁K
276
+ - LIAH
277
+ - ▁SAYOH
278
+ - MATI
279
+ - ▁PI
280
+ - TS
281
+ - ▁MÁS
282
+ - XMATI
283
+ - KAH
284
+ - ▁XI
285
+ - M
286
+ - ▁ESTE
287
+ - HKO
288
+ - KOWIT
289
+ - MIKI
290
+ - CHO
291
+ - ▁TAK
292
+ - Á
293
+ - ▁KILIAH
294
+ - CHIO
295
+ - ▁KIHTOWA
296
+ - ▁KITE
297
+ - NEKI
298
+ - ▁ME
299
+ - XA
300
+ - ▁TEL
301
+ - B
302
+ - ▁KOWIT
303
+ - ▁ATA
304
+ - TIK
305
+ - ▁EKINTSI
306
+ - ▁IMA
307
+ - ▁KWA
308
+ - ▁OSO
309
+ - ▁NEHJÓ
310
+ - ▁ITEYO
311
+ - Y
312
+ - SKEH
313
+ - ▁ISTA
314
+ - ▁NIKILIA
315
+ - LIH
316
+ - ▁TIKWI
317
+ - ▁PANÉ
318
+ - KOWA
319
+ - ▁OX
320
+ - TEKI
321
+ - ▁SA
322
+ - NTE
323
+ - ▁KIKWI
324
+ - TSITSI
325
+ - NOH
326
+ - AHSI
327
+ - ▁IXO
328
+ - WIA
329
+ - LTSI
330
+ - ▁KIMA
331
+ - C
332
+ - ▁WEHWEI
333
+ - ▁TEPITSI
334
+ - ▁IHK
335
+ - ▁XIWIT
336
+ - YI
337
+ - LIS
338
+ - ▁CA
339
+ - XMATTOK
340
+ - SÁ
341
+ - ▁MOTA
342
+ - RE
343
+ - ▁TIKIHTO
344
+ - ▁MI
345
+ - ▁X
346
+ - D
347
+ - ▁SAN
348
+ - WIH
349
+ - ▁WEHKA
350
+ - KWE
351
+ - CHA
352
+ - ▁SI
353
+ - KTIK
354
+ - ▁YETOK
355
+ - ▁MOKA
356
+ - NEMI
357
+ - LILIA
358
+ - ▁¿
359
+ - TIW
360
+ - ▁KIHTOWAH
361
+ - LTI
362
+ - Ó
363
+ - MASÁ
364
+ - ▁POR
365
+ - ▁TIKITA
366
+ - KETSA
367
+ - ▁IWA
368
+ - METS
369
+ - YOH
370
+ - ▁TAKWA
371
+ - HKEH
372
+ - ▁KIKWIH
373
+ - ▁KIKWA
374
+ - NIA
375
+ - ▁ACHI
376
+ - ▁KIKWAH
377
+ - ▁KACHI
378
+ - ▁PO
379
+ - ▁IGUAL
380
+ - NAL
381
+ - ▁PILI
382
+ - ▁NIMAN
383
+ - YE
384
+ - ▁NIKMATI
385
+ - WIAH
386
+ - ▁KIPA
387
+ - ▁M
388
+ - J
389
+ - ▁KWI
390
+ - ▁WI
391
+ - WAYA
392
+ - Z
393
+ - ▁KITEKI
394
+ - G
395
+ - ▁'
396
+ - ▁IHKO
397
+ - CE
398
+ - ▁TONI
399
+ - ▁TSIKITSI
400
+ - P
401
+ - DO
402
+ - TOKEH
403
+ - NIK
404
+ - ▁TIKILIAH
405
+ - ▁KOWTAH
406
+ - ▁TAI
407
+ - ▁TATA
408
+ - TIAH
409
+ - CA
410
+ - PIL
411
+ - CHOWA
412
+ - ▁KIMATI
413
+ - ▁TAMA
414
+ - XKA
415
+ - XIWIT
416
+ - TOS
417
+ - KILIT
418
+ - ILWI
419
+ - SKI
420
+ - YEH
421
+ - DA
422
+ - WAYO
423
+ - ▁TAPA
424
+ - ▁NIMO
425
+ - CHIT
426
+ - ▁NIMITS
427
+ - ▁KINA
428
+ - PAHTI
429
+ - RI
430
+ - ▁BUENO
431
+ - ▁ESKI
432
+ - WAYAH
433
+ - PANO
434
+ - KOW
435
+ - WEYAK
436
+ - LPAN
437
+ - LTIA
438
+ - ▁KITO
439
+ - CO
440
+ - ▁TINE
441
+ - KIH
442
+ - JO
443
+ - ▁KATKA
444
+ - ▁TIKTA
445
+ - PAHTIA
446
+ - ▁XIWTSI
447
+ - ▁CHIKA
448
+ - ▁KANAH
449
+ - ▁KOYO
450
+ - MPI
451
+ - ▁IXIWYO
452
+ - IHTIK
453
+ - ▁KWE
454
+ - ▁XIW
455
+ - WILIA
456
+ - XTIK
457
+ - ▁VE
458
+ - ▁TIKMATI
459
+ - ▁KOKOLIS
460
+ - LKWI
461
+ - ▁AHKO
462
+ - MEKAT
463
+ - ▁TIKMA
464
+ - ▁NIMITSILIA
465
+ - ▁MITS
466
+ - XTA
467
+ - ▁CO
468
+ - ▁KOMA
469
+ - ▁KOMOHKÓ
470
+ - F
471
+ - ▁OKSEKI
472
+ - ▁TEISÁ
473
+ - ▁ESO
474
+ - ▁IKOWYO
475
+ - ▁ES
476
+ - TOHTO
477
+ - XTI
478
+ - ▁TSI
479
+ - ▁TIKO
480
+ - PIHPI
481
+ - ▁OKSÉ
482
+ - ▁WEHKAPAN
483
+ - KALAKI
484
+ - ▁WEL
485
+ - ▁MIGUEL
486
+ - TEKITI
487
+ - ▁TOKNI
488
+ - ROWA
489
+ - ▁MOSKALTIA
490
+ - Í
491
+ - XOKO
492
+ - ▁TIKCHI
493
+ - ▁EHE
494
+ - ▁KWO
495
+ - LPI
496
+ - HTOK
497
+ - TSTI
498
+ - TÍ
499
+ - ▁TEIHSÁ
500
+ - KILO
501
+ - ▁PUES
502
+ - SKIA
503
+ - HTIW
504
+ - LILIAH
505
+ - ▁IHWA
506
+ - ▁KOSTIK
507
+ - ▁TIKIHTOWAH
508
+ - ▁CHA
509
+ - ▁COMO
510
+ - ▁KIMANA
511
+ - CU
512
+ - TAMAN
513
+ - WITS
514
+ - ▁KOKO
515
+ - ILPIA
516
+ - ▁NIMONO
517
+ - ▁WELI
518
+ - ▁NIKWI
519
+ - WTOK
520
+ - ▁KINEKI
521
+ - KOKOH
522
+ - ▁P
523
+ - LTIAH
524
+ - XKO
525
+ - ▁ONKAYA
526
+ - TAPOWI
527
+ - MATTOK
528
+ - ▁MISMO
529
+ - ▁NIKIHTO
530
+ - ▁NIKMATTOK
531
+ - MESKIA
532
+ - ▁SOH
533
+ - KWOWIT
534
+ - XTIA
535
+ - WELITA
536
+ - ▁DESPUÉS
537
+ - ▁IXWA
538
+ - ZA
539
+ - TSAPOT
540
+ - SKAL
541
+ - ▁SIEMPRE
542
+ - TINEMI
543
+ - Ñ
544
+ - ▁ESKIA
545
+ - NELOWA
546
+ - ▁TZINACAPAN
547
+ - ▁DI
548
+ - XIWYO
549
+ - ▁AHA
550
+ - ▁AHWIA
551
+ - É
552
+ - ▁KIKWIAH
553
+ - MATTOKEH
554
+ - ▁ACHTO
555
+ - XTILIA
556
+ - TAPAL
557
+ - ▁KIHTO
558
+ - TEHTE
559
+ - ▁PORIN
560
+ - ▁TSOPE
561
+ - ▁KAHFE
562
+ - GU
563
+ - ▁NIMITSTAHTANI
564
+ - ▁TAHTA
565
+ - ▁KOWTATI
566
+ - ISWAT
567
+ - ▁TIKPIA
568
+ - ▁KOMEKAT
569
+ - TIOWIH
570
+ - ▁TIMONOHNO
571
+ - ▁TIEMPO
572
+ - WEHKA
573
+ - QUI
574
+ - ▁TIHTI
575
+ - ▁XOXOKTIK
576
+ - ▁TAXKAL
577
+ - EHE
578
+ - ▁AJÁ
579
+ - NANAKAT
580
+ - NIWKI
581
+ - ▁CI
582
+ - ▁ITSMOL
583
+ - ▁NIKPIA
584
+ - TEKPA
585
+ - ▁BO
586
+ - ▁TASOHKA
587
+ - Ú
588
+ - ¡
589
+ - '8'
590
+ - '9'
591
+ - '0'
592
+ - '1'
593
+ - '2'
594
+ - ¿
595
+ - Ò
596
+ - '4'
597
+ - À
598
+ - '7'
599
+ - '5'
600
+ - '3'
601
+ - ́
602
+ - V
603
+ - ̈
604
+ - Ï
605
+ - '6'
606
+ - Q
607
+ - Ì
608
+ - <sos/eos>
609
+ init: xavier_uniform
610
+ input_size: null
611
+ ctc_conf:
612
+ dropout_rate: 0.0
613
+ ctc_type: builtin
614
+ reduce: true
615
+ ignore_nan_grad: true
616
+ model_conf:
617
+ ctc_weight: 0.3
618
+ lsm_weight: 0.1
619
+ length_normalized_loss: false
620
+ extract_feats_in_collect_stats: false
621
+ use_preprocessor: true
622
+ token_type: bpe
623
+ bpemodel: data/token_list/bpe_unigram500/bpe.model
624
+ non_linguistic_symbols: null
625
+ cleaner: null
626
+ g2p: null
627
+ speech_volume_normalize: null
628
+ rir_scp: null
629
+ rir_apply_prob: 1.0
630
+ noise_scp: null
631
+ noise_apply_prob: 1.0
632
+ noise_db_range: '13_15'
633
+ frontend: s3prl
634
+ frontend_conf:
635
+ frontend_conf:
636
+ upstream: hubert_large_ll60k
637
+ download_dir: ./hub
638
+ multilayer_feature: true
639
+ fs: 16k
640
+ specaug: specaug
641
+ specaug_conf:
642
+ apply_time_warp: true
643
+ time_warp_window: 5
644
+ time_warp_mode: bicubic
645
+ apply_freq_mask: true
646
+ freq_mask_width_range:
647
+ - 0
648
+ - 30
649
+ num_freq_mask: 2
650
+ apply_time_mask: true
651
+ time_mask_width_range:
652
+ - 0
653
+ - 40
654
+ num_time_mask: 2
655
+ normalize: utterance_mvn
656
+ normalize_conf: {}
657
+ preencoder: linear
658
+ preencoder_conf:
659
+ input_size: 1024
660
+ output_size: 80
661
+ encoder: transformer
662
+ encoder_conf:
663
+ input_layer: conv2d
664
+ num_blocks: 12
665
+ linear_units: 2048
666
+ dropout_rate: 0.1
667
+ output_size: 256
668
+ attention_heads: 4
669
+ attention_dropout_rate: 0.0
670
+ postencoder: null
671
+ postencoder_conf: {}
672
+ decoder: transformer
673
+ decoder_conf:
674
+ input_layer: embed
675
+ num_blocks: 6
676
+ linear_units: 2048
677
+ dropout_rate: 0.1
678
+ required:
679
+ - output_dir
680
+ - token_list
681
+ version: 0.10.4a1
682
+ distributed: false
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/acc.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/backward_time.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/cer.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/forward_time.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/iter_time.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/loss.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/loss_att.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/train_time.png ADDED
exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/images/wer.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.5a1
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/45epoch.pth
4
+ python: "3.9.7 (default, Sep 16 2021, 13:09:58) \n[GCC 7.5.0]"
5
+ timestamp: 1640101379.421356
6
+ torch: 1.9.0
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_transformer_hubert_raw_bpe500_sp/config.yaml