Siddhant commited on
Commit
c01a05b
1 Parent(s): 59df14a

Upload model from zenodo

Browse files
Files changed (19) hide show
  1. README.md +43 -0
  2. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/RESULTS.md +31 -0
  3. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml +307 -0
  4. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/acc.png +0 -0
  5. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/backward_time.png +0 -0
  6. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer.png +0 -0
  7. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/forward_time.png +0 -0
  9. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/iter_time.png +0 -0
  11. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss.png +0 -0
  12. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_att.png +0 -0
  13. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_ctc.png +0 -0
  14. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim0_lr0.png +0 -0
  15. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim_step_time.png +0 -0
  16. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/train_time.png +0 -0
  17. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/wer.png +0 -0
  18. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth +3 -0
  19. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - fsc_challenge
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 ASR pretrained model
12
+ ### `siddhana/fsc_challenge_asr_train_asr_hubert_transformer_adam_specaug_raw_en_word_valid.acc.ave_5best`
13
+ ♻️ Imported from https://zenodo.org/record/5656007
14
+
15
+ This model was trained by siddhana using fsc_challenge/asr1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+
32
+ ```
33
+ or arXiv:
34
+ ```bibtex
35
+ @misc{watanabe2018espnet,
36
+ title={ESPnet: End-to-End Speech Processing Toolkit},
37
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
38
+ year={2018},
39
+ eprint={1804.00015},
40
+ archivePrefix={arXiv},
41
+ primaryClass={cs.CL}
42
+ }
43
+ ```
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/RESULTS.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Oct 3 22:25:25 EDT 2021`
5
+ - python version: `3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.3a3`
7
+ - pytorch version: `pytorch 1.9.0+cu102`
8
+ - Git hash: `97b9dad4dbca71702cb7928a126ec45d96414a3f`
9
+ - Commit date: `Mon Sep 13 22:55:04 2021 +0900`
10
+
11
+ ## asr_train_asr_hubert_transformer_adam_specaug_raw_en_word
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|17937|98.5|1.2|0.3|0.5|2.0|4.7|
17
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|22540|85.5|12.6|1.9|3.1|17.6|44.7|
18
+ |inference_asr_model_valid.acc.ave_5best/valid|2597|13782|98.8|0.8|0.4|0.2|1.4|2.9|
19
+
20
+ ### CER
21
+
22
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
23
+ |---|---|---|---|---|---|---|---|---|
24
+ |inference_asr_model_valid.acc.ave_5best/spk_test|3349|152191|99.2|0.5|0.3|0.3|1.1|4.7|
25
+ |inference_asr_model_valid.acc.ave_5best/utt_test|4204|191435|92.4|5.3|2.3|2.7|10.3|44.7|
26
+ |inference_asr_model_valid.acc.ave_5best/valid|2597|117282|99.4|0.4|0.3|0.2|0.8|2.9|
27
+
28
+ ### TER
29
+
30
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
31
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml ADDED
@@ -0,0 +1,307 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_hubert_transformer_adam_specaug.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 80
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models: 5
50
+ grad_clip: 5.0
51
+ grad_clip_type: 2.0
52
+ grad_noise: false
53
+ accum_grad: 1
54
+ no_forward_run: false
55
+ resume: true
56
+ train_dtype: float32
57
+ use_amp: false
58
+ log_interval: null
59
+ use_tensorboard: true
60
+ use_wandb: false
61
+ wandb_project: null
62
+ wandb_id: null
63
+ wandb_entity: null
64
+ wandb_name: null
65
+ wandb_model_log_interval: -1
66
+ detect_anomaly: false
67
+ pretrain_path: null
68
+ init_param: []
69
+ ignore_init_mismatch: false
70
+ freeze_param: []
71
+ num_iters_per_epoch: null
72
+ batch_size: 20
73
+ valid_batch_size: null
74
+ batch_bins: 1000000
75
+ valid_batch_bins: null
76
+ train_shape_file:
77
+ - exp/asr_stats_raw_en_word/train/speech_shape
78
+ - exp/asr_stats_raw_en_word/train/text_shape.word
79
+ valid_shape_file:
80
+ - exp/asr_stats_raw_en_word/valid/speech_shape
81
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
82
+ batch_type: folded
83
+ valid_batch_type: null
84
+ fold_length:
85
+ - 80000
86
+ - 150
87
+ sort_in_batch: descending
88
+ sort_batch: descending
89
+ multiple_iterator: false
90
+ chunk_length: 500
91
+ chunk_shift_ratio: 0.5
92
+ num_cache_chunks: 1024
93
+ train_data_path_and_name_and_type:
94
+ - - dump/raw/train/wav.scp
95
+ - speech
96
+ - sound
97
+ - - dump/raw/train/text
98
+ - text
99
+ - text
100
+ valid_data_path_and_name_and_type:
101
+ - - dump/raw/valid/wav.scp
102
+ - speech
103
+ - sound
104
+ - - dump/raw/valid/text
105
+ - text
106
+ - text
107
+ allow_variable_data_keys: false
108
+ max_cache_size: 0.0
109
+ max_cache_fd: 32
110
+ valid_max_cache_size: null
111
+ optim: adam
112
+ optim_conf:
113
+ lr: 0.0002
114
+ scheduler: warmuplr
115
+ scheduler_conf:
116
+ warmup_steps: 25000
117
+ token_list:
118
+ - <blank>
119
+ - <unk>
120
+ - ▁the
121
+ - ▁
122
+ - e
123
+ - ▁turn
124
+ - ▁in
125
+ - s
126
+ - ▁lights
127
+ - o
128
+ - ▁m
129
+ - c
130
+ - i
131
+ - ▁heat
132
+ - a
133
+ - t
134
+ - hroom
135
+ - ▁up
136
+ - ▁s
137
+ - ▁on
138
+ - ▁down
139
+ - n
140
+ - ▁temperature
141
+ - crease
142
+ - p
143
+ - ▁t
144
+ - u
145
+ - ▁b
146
+ - ▁switch
147
+ - w
148
+ - h
149
+ - d
150
+ - ou
151
+ - ▁kitchen
152
+ - ▁volume
153
+ - ▁off
154
+ - ing
155
+ - y
156
+ - increase_volume_none
157
+ - ▁bedroom
158
+ - ▁langu
159
+ - age
160
+ - as
161
+ - decrease_volume_none
162
+ - ▁l
163
+ - r
164
+ - er
165
+ - at
166
+ - ▁d
167
+ - l
168
+ - decrease_heat_washroom
169
+ - increase_heat_washroom
170
+ - k
171
+ - an
172
+ - g
173
+ - increase_heat_none
174
+ - oo
175
+ - decrease_heat_none
176
+ - ge
177
+ - change_language_none_none
178
+ - activate_lights_washroom
179
+ - activate_lights_kitchen
180
+ - ow
181
+ - in
182
+ - activate_music_none
183
+ - mp
184
+ - deactivate_music_none
185
+ - increase_heat_bedroom
186
+ - increase_heat_kitchen
187
+ - decrease_heat_kitchen
188
+ - it
189
+ - activate_lights_bedroom
190
+ - deactivate_lights_bedroom
191
+ - f
192
+ - re
193
+ - decrease_heat_bedroom
194
+ - ed
195
+ - deactivate_lights_kitchen
196
+ - bring_newspaper_none
197
+ - bring_shoes_none
198
+ - bring_socks_none
199
+ - activate_lights_none
200
+ - deactivate_lights_none
201
+ - q
202
+ - deactivate_lights_washroom
203
+ - change_language_Chinese_none
204
+ - bring_juice_none
205
+ - j
206
+ - m
207
+ - deactivate_lamp_none
208
+ - activate_lamp_none
209
+ - change_language_Korean_none
210
+ - ▁k
211
+ - me
212
+ - change_language_German_none
213
+ - ▁o
214
+ - change_language_English_none
215
+ - ▁he
216
+ - ase
217
+ - ff
218
+ - ume
219
+ - ▁v
220
+ - x
221
+ - ▁u
222
+ - v
223
+ - <sos/eos>
224
+ init: null
225
+ input_size: null
226
+ ctc_conf:
227
+ dropout_rate: 0.0
228
+ ctc_type: builtin
229
+ reduce: true
230
+ ignore_nan_grad: true
231
+ model_conf:
232
+ ctc_weight: 0.5
233
+ ignore_id: -1
234
+ lsm_weight: 0.0
235
+ length_normalized_loss: false
236
+ report_cer: true
237
+ report_wer: true
238
+ sym_space: <space>
239
+ sym_blank: <blank>
240
+ extract_feats_in_collect_stats: true
241
+ use_preprocessor: true
242
+ token_type: word
243
+ bpemodel: null
244
+ non_linguistic_symbols: null
245
+ cleaner: null
246
+ g2p: null
247
+ speech_volume_normalize: null
248
+ rir_scp: null
249
+ rir_apply_prob: 1.0
250
+ noise_scp: null
251
+ noise_apply_prob: 1.0
252
+ noise_db_range: '13_15'
253
+ frontend: s3prl
254
+ frontend_conf:
255
+ frontend_conf:
256
+ upstream: hubert_large_ll60k
257
+ download_dir: ./hub
258
+ multilayer_feature: true
259
+ fs: 16k
260
+ specaug: specaug
261
+ specaug_conf:
262
+ apply_time_warp: true
263
+ time_warp_window: 5
264
+ time_warp_mode: bicubic
265
+ apply_freq_mask: true
266
+ freq_mask_width_range:
267
+ - 0
268
+ - 30
269
+ num_freq_mask: 2
270
+ apply_time_mask: true
271
+ time_mask_width_range:
272
+ - 0
273
+ - 40
274
+ num_time_mask: 2
275
+ normalize: utterance_mvn
276
+ normalize_conf: {}
277
+ preencoder: linear
278
+ preencoder_conf:
279
+ input_size: 1024
280
+ output_size: 80
281
+ encoder: transformer
282
+ encoder_conf:
283
+ output_size: 256
284
+ attention_heads: 4
285
+ linear_units: 2048
286
+ num_blocks: 12
287
+ dropout_rate: 0.1
288
+ positional_dropout_rate: 0.1
289
+ attention_dropout_rate: 0.0
290
+ input_layer: conv2d
291
+ normalize_before: true
292
+ postencoder: null
293
+ postencoder_conf: {}
294
+ decoder: transformer
295
+ decoder_conf:
296
+ attention_heads: 4
297
+ linear_units: 2048
298
+ num_blocks: 6
299
+ dropout_rate: 0.1
300
+ positional_dropout_rate: 0.1
301
+ self_attention_dropout_rate: 0.0
302
+ src_attention_dropout_rate: 0.0
303
+ required:
304
+ - output_dir
305
+ - token_list
306
+ version: 0.10.3a3
307
+ distributed: false
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/acc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/backward_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer_ctc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/forward_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/iter_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_att.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_ctc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim0_lr0.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim_step_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/train_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/wer.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29ee8265e7d7f4cc0ab13186cd45cf1713a8afa6b1a40683bae4fcfb82eabfb6
3
+ size 1375817761
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a3
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth
4
+ python: "3.8.11 (default, Aug 3 2021, 15:09:35) \n[GCC 7.5.0]"
5
+ timestamp: 1636435922.417136
6
+ torch: 1.9.0+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml