Siddhant commited on
Commit
d4eb5f4
1 Parent(s): f9a53e5

import from zenodo

Browse files
Files changed (19) hide show
  1. README.md +43 -0
  2. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/RESULTS.md +29 -0
  3. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml +355 -0
  4. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/acc.png +0 -0
  5. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/backward_time.png +0 -0
  6. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer.png +0 -0
  7. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/forward_time.png +0 -0
  9. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/iter_time.png +0 -0
  11. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss.png +0 -0
  12. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_att.png +0 -0
  13. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_ctc.png +0 -0
  14. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim0_lr0.png +0 -0
  15. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim_step_time.png +0 -0
  16. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/train_time.png +0 -0
  17. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/wer.png +0 -0
  18. exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth +3 -0
  19. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - fsc
9
+ license: cc-by-4.0
10
+ ---
11
+ ## ESPnet2 ASR pretrained model
12
+ ### `siddhana/fsc_asr_train_asr_hubert_transformer_adam_specaug_raw_en_word_valid.acc.ave_5best`
13
+ ♻️ Imported from https://zenodo.org/record/5590204
14
+
15
+ This model was trained by siddhana using fsc/asr1 recipe in [espnet](https://github.com/espnet/espnet/).
16
+ ### Demo: How to use in ESPnet2
17
+ ```python
18
+ # coming soon
19
+ ```
20
+ ### Citing ESPnet
21
+ ```BibTex
22
+ @inproceedings{watanabe2018espnet,
23
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
24
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
25
+ year={2018},
26
+ booktitle={Proceedings of Interspeech},
27
+ pages={2207--2211},
28
+ doi={10.21437/Interspeech.2018-1456},
29
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
30
+ }
31
+
32
+ ```
33
+ or arXiv:
34
+ ```bibtex
35
+ @misc{watanabe2018espnet,
36
+ title={ESPnet: End-to-End Speech Processing Toolkit},
37
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Enrique Yalta Soplin and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
38
+ year={2018},
39
+ eprint={1804.00015},
40
+ archivePrefix={arXiv},
41
+ primaryClass={cs.CL}
42
+ }
43
+ ```
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon Oct 11 12:39:01 EDT 2021`
5
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.3a2`
7
+ - pytorch version: `pytorch 1.8.1+cu102`
8
+ - Git hash: `8ef7bd675815ae2fbaba930a53c8ad4ae0ad19af`
9
+ - Commit date: `Sat Sep 11 10:05:59 2021 +0900`
10
+
11
+ ## asr_train_asr_hubert_transformer_adam_specaug_raw_en_word
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |inference_asr_model_valid.acc.ave_5best/test|3793|20316|99.7|0.2|0.1|0.1|0.4|0.9|
17
+ |inference_asr_model_valid.acc.ave_5best/valid|3118|16751|99.1|0.7|0.2|0.2|1.2|2.7|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |inference_asr_model_valid.acc.ave_5best/test|3793|172445|99.9|0.1|0.1|0.1|0.2|0.9|
24
+ |inference_asr_model_valid.acc.ave_5best/valid|3118|142122|99.6|0.2|0.2|0.2|0.6|2.7|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml ADDED
@@ -0,0 +1,355 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_hubert_transformer_adam_specaug.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 200
28
+ patience: null
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models: 5
50
+ grad_clip: 5.0
51
+ grad_clip_type: 2.0
52
+ grad_noise: false
53
+ accum_grad: 1
54
+ no_forward_run: false
55
+ resume: true
56
+ train_dtype: float32
57
+ use_amp: false
58
+ log_interval: null
59
+ use_tensorboard: true
60
+ use_wandb: false
61
+ wandb_project: null
62
+ wandb_id: null
63
+ wandb_entity: null
64
+ wandb_name: null
65
+ wandb_model_log_interval: -1
66
+ detect_anomaly: false
67
+ pretrain_path: null
68
+ init_param: []
69
+ ignore_init_mismatch: false
70
+ freeze_param:
71
+ - frontend.upstream
72
+ num_iters_per_epoch: null
73
+ batch_size: 20
74
+ valid_batch_size: null
75
+ batch_bins: 1000000
76
+ valid_batch_bins: null
77
+ train_shape_file:
78
+ - exp/asr_stats_raw_en_word/train/speech_shape
79
+ - exp/asr_stats_raw_en_word/train/text_shape.word
80
+ valid_shape_file:
81
+ - exp/asr_stats_raw_en_word/valid/speech_shape
82
+ - exp/asr_stats_raw_en_word/valid/text_shape.word
83
+ batch_type: folded
84
+ valid_batch_type: null
85
+ fold_length:
86
+ - 80000
87
+ - 150
88
+ sort_in_batch: descending
89
+ sort_batch: descending
90
+ multiple_iterator: false
91
+ chunk_length: 500
92
+ chunk_shift_ratio: 0.5
93
+ num_cache_chunks: 1024
94
+ train_data_path_and_name_and_type:
95
+ - - dump/raw/train/wav.scp
96
+ - speech
97
+ - sound
98
+ - - dump/raw/train/text
99
+ - text
100
+ - text
101
+ valid_data_path_and_name_and_type:
102
+ - - dump/raw/valid/wav.scp
103
+ - speech
104
+ - sound
105
+ - - dump/raw/valid/text
106
+ - text
107
+ - text
108
+ allow_variable_data_keys: false
109
+ max_cache_size: 0.0
110
+ max_cache_fd: 32
111
+ valid_max_cache_size: null
112
+ optim: adam
113
+ optim_conf:
114
+ lr: 0.0002
115
+ scheduler: warmuplr
116
+ scheduler_conf:
117
+ warmup_steps: 25000
118
+ token_list:
119
+ - <blank>
120
+ - <unk>
121
+ - the
122
+ - Turn
123
+ - in
124
+ - lights
125
+ - 'on'
126
+ - up
127
+ - down
128
+ - temperature
129
+ - heat
130
+ - 'off'
131
+ - Switch
132
+ - increase_volume_none
133
+ - kitchen
134
+ - language
135
+ - decrease_volume_none
136
+ - bedroom
137
+ - washroom
138
+ - volume
139
+ - my
140
+ - to
141
+ - bathroom
142
+ - Decrease
143
+ - increase_heat_washroom
144
+ - decrease_heat_washroom
145
+ - Increase
146
+ - music
147
+ - heating
148
+ - Bring
149
+ - increase_heat_none
150
+ - decrease_heat_none
151
+ - me
152
+ - change_language_none_none
153
+ - activate_lights_washroom
154
+ - Set
155
+ - Lights
156
+ - activate_lights_kitchen
157
+ - I
158
+ - activate_music_none
159
+ - too
160
+ - it
161
+ - increase_heat_bedroom
162
+ - decrease_heat_bedroom
163
+ - sound
164
+ - increase_heat_kitchen
165
+ - decrease_heat_kitchen
166
+ - deactivate_music_none
167
+ - lamp
168
+ - Make
169
+ - deactivate_lights_bedroom
170
+ - deactivate_lights_kitchen
171
+ - bring_newspaper_none
172
+ - newspaper
173
+ - activate_lights_bedroom
174
+ - bring_socks_none
175
+ - socks
176
+ - bring_shoes_none
177
+ - shoes
178
+ - need
179
+ - Volume
180
+ - activate_lights_none
181
+ - deactivate_lights_none
182
+ - bring_juice_none
183
+ - juice
184
+ - deactivate_lights_washroom
185
+ - change_language_Chinese_none
186
+ - deactivate_lamp_none
187
+ - activate_lamp_none
188
+ - Kitchen
189
+ - turn
190
+ - some
191
+ - Could
192
+ - you
193
+ - Bedroom
194
+ - Go
195
+ - get
196
+ - Washroom
197
+ - Chinese
198
+ - phone's
199
+ - change_language_English_none
200
+ - Get
201
+ - change_language_Korean_none
202
+ - OK
203
+ - now
204
+ - switch
205
+ - main
206
+ - change_language_German_none
207
+ - practice
208
+ - Louder
209
+ - Stop
210
+ - loud
211
+ - increase
212
+ - Play
213
+ - hear
214
+ - Change
215
+ - quiet
216
+ - Bathroom
217
+ - Fetch
218
+ - Korean
219
+ - English
220
+ - German
221
+ - Pause
222
+ - Lamp
223
+ - Resume
224
+ - louder
225
+ - Heat
226
+ - audio
227
+ - Its
228
+ - loud,
229
+ - heating?
230
+ - Far
231
+ - a
232
+ - different
233
+ - please?
234
+ - decrease
235
+ - Too
236
+ - settings
237
+ - Put
238
+ - Start
239
+ - Quieter
240
+ - please
241
+ - Thats
242
+ - softer
243
+ - max
244
+ - mute
245
+ - lower
246
+ - phone
247
+ - couldn't
248
+ - anything,
249
+ - Reduce
250
+ - this,
251
+ - More
252
+ - That's
253
+ - Lower
254
+ - levels
255
+ - Use
256
+ - hotter
257
+ - languages
258
+ - Allow
259
+ - can't
260
+ - that
261
+ - Less
262
+ - system
263
+ - cooler
264
+ - This
265
+ - video
266
+ - is
267
+ - low,
268
+ - device
269
+ - Chinese.
270
+ - quieter
271
+ - English.
272
+ - Language
273
+ - Open
274
+ - German.
275
+ - Korean.
276
+ - <sos/eos>
277
+ init: null
278
+ input_size: null
279
+ ctc_conf:
280
+ dropout_rate: 0.0
281
+ ctc_type: builtin
282
+ reduce: true
283
+ ignore_nan_grad: true
284
+ model_conf:
285
+ ctc_weight: 0.3
286
+ lsm_weight: 0.1
287
+ length_normalized_loss: false
288
+ extract_feats_in_collect_stats: false
289
+ use_preprocessor: true
290
+ token_type: word
291
+ bpemodel: null
292
+ non_linguistic_symbols: null
293
+ cleaner: null
294
+ g2p: null
295
+ speech_volume_normalize: null
296
+ rir_scp: null
297
+ rir_apply_prob: 1.0
298
+ noise_scp: null
299
+ noise_apply_prob: 1.0
300
+ noise_db_range: '13_15'
301
+ frontend: s3prl
302
+ frontend_conf:
303
+ frontend_conf:
304
+ upstream: hubert_large_ll60k
305
+ download_dir: ./hub
306
+ multilayer_feature: true
307
+ fs: 16k
308
+ specaug: specaug
309
+ specaug_conf:
310
+ apply_time_warp: true
311
+ time_warp_window: 5
312
+ time_warp_mode: bicubic
313
+ apply_freq_mask: true
314
+ freq_mask_width_range:
315
+ - 0
316
+ - 30
317
+ num_freq_mask: 2
318
+ apply_time_mask: true
319
+ time_mask_width_range:
320
+ - 0
321
+ - 40
322
+ num_time_mask: 2
323
+ normalize: utterance_mvn
324
+ normalize_conf: {}
325
+ preencoder: linear
326
+ preencoder_conf:
327
+ input_size: 1024
328
+ output_size: 80
329
+ encoder: transformer
330
+ encoder_conf:
331
+ output_size: 256
332
+ attention_heads: 4
333
+ linear_units: 2048
334
+ num_blocks: 12
335
+ dropout_rate: 0.1
336
+ positional_dropout_rate: 0.1
337
+ attention_dropout_rate: 0.0
338
+ input_layer: conv2d
339
+ normalize_before: true
340
+ postencoder: null
341
+ postencoder_conf: {}
342
+ decoder: transformer
343
+ decoder_conf:
344
+ attention_heads: 4
345
+ linear_units: 2048
346
+ num_blocks: 6
347
+ dropout_rate: 0.1
348
+ positional_dropout_rate: 0.1
349
+ self_attention_dropout_rate: 0.0
350
+ src_attention_dropout_rate: 0.0
351
+ required:
352
+ - output_dir
353
+ - token_list
354
+ version: 0.10.3a2
355
+ distributed: false
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/acc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/backward_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/cer_ctc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/forward_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/iter_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_att.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/loss_ctc.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim0_lr0.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/optim_step_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/train_time.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/images/wer.png ADDED
exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2adc40ef8aaace766e9aa307cc49e78419a516a75fa58b3eabecc9e857fc5a4
3
+ size 1375946815
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.3a2
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/valid.acc.ave_5best.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1634830330.00052
6
+ torch: 1.8.1+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_hubert_transformer_adam_specaug_raw_en_word/config.yaml