“siddhu001” commited on
Commit
3e851ef
1 Parent(s): 4431bb7

Update model

Browse files
Files changed (20) hide show
  1. README.md +331 -0
  2. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/RESULTS.md +46 -0
  3. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/config.yaml +211 -0
  4. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/acc.png +0 -0
  5. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/backward_time.png +0 -0
  6. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/cer.png +0 -0
  7. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/clip.png +0 -0
  8. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/forward_time.png +0 -0
  9. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/grad_norm.png +0 -0
  11. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/iter_time.png +0 -0
  12. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss.png +0 -0
  13. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss_att.png +0 -0
  14. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss_scale.png +0 -0
  15. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/optim0_lr0.png +0 -0
  16. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/optim_step_time.png +0 -0
  17. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/train_time.png +0 -0
  18. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/wer.png +0 -0
  19. exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/valid.loss.ave_10best.pth +3 -0
  20. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - slue-voxceleb
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/sluevoxceleb_whisper_finetune_sa`
15
+
16
+ This model was trained by “siddhu001” using slue-voxceleb recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout e23ef85f0b3116ad5c60d0833f186da0deec0734
26
+ pip install -e .
27
+ cd egs2/slue-voxceleb/slu1_superb_correct
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/sluevoxceleb_whisper_finetune_sa
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Wed Feb 7 23:47:19 CST 2024`
35
+ - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202310`
37
+ - pytorch version: `pytorch 2.1.0+cu121`
38
+ - Git hash: `21d2105784e4da98397bf487b2550d4c6e16d40d`
39
+ - Commit date: `Wed Jan 31 13:40:37 2024 -0600`
40
+
41
+ ## exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|1436|80.6|19.4|0.0|0.0|19.4|19.4|
47
+ |decode_asr_slu_model_valid.loss.ave/test|3426|3426|81.6|18.4|0.0|0.0|18.4|18.4|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|10365|83.1|15.2|1.8|0.9|17.8|19.4|
54
+ |decode_asr_slu_model_valid.loss.ave/test|3426|24887|84.4|13.8|1.8|0.7|16.3|18.4|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+ ## exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/decode_asr_slu_model_valid.loss.ave
61
+ ### WER
62
+
63
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
64
+ |---|---|---|---|---|---|---|---|---|
65
+ |org/devel|1437|1437|80.6|19.4|0.0|0.0|19.4|19.4|
66
+
67
+ ### CER
68
+
69
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
70
+ |---|---|---|---|---|---|---|---|---|
71
+ |org/devel|1437|10372|83.1|15.2|1.8|0.9|17.8|19.4|
72
+
73
+ ### TER
74
+
75
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
76
+ |---|---|---|---|---|---|---|---|---|
77
+
78
+ ## ASR config
79
+
80
+ <details><summary>expand</summary>
81
+
82
+ ```
83
+ config: conf/train_asr_whisper_weighted_finetune_0.00001.yaml
84
+ print_config: false
85
+ log_level: INFO
86
+ drop_last_iter: false
87
+ dry_run: false
88
+ iterator_type: sequence
89
+ valid_iterator_type: null
90
+ output_dir: exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp
91
+ ngpu: 1
92
+ seed: 0
93
+ num_workers: 1
94
+ num_att_plot: 3
95
+ dist_backend: nccl
96
+ dist_init_method: env://
97
+ dist_world_size: 4
98
+ dist_rank: 0
99
+ local_rank: 0
100
+ dist_master_addr: localhost
101
+ dist_master_port: 55405
102
+ dist_launcher: null
103
+ multiprocessing_distributed: true
104
+ unused_parameters: true
105
+ sharded_ddp: false
106
+ cudnn_enabled: true
107
+ cudnn_benchmark: false
108
+ cudnn_deterministic: true
109
+ collect_stats: false
110
+ write_collected_feats: false
111
+ max_epoch: 50
112
+ patience: null
113
+ val_scheduler_criterion:
114
+ - valid
115
+ - loss
116
+ early_stopping_criterion:
117
+ - valid
118
+ - loss
119
+ - min
120
+ best_model_criterion:
121
+ - - valid
122
+ - loss
123
+ - min
124
+ - - train
125
+ - loss
126
+ - min
127
+ keep_nbest_models: 10
128
+ nbest_averaging_interval: 0
129
+ grad_clip: 5.0
130
+ grad_clip_type: 2.0
131
+ grad_noise: false
132
+ accum_grad: 2
133
+ no_forward_run: false
134
+ resume: true
135
+ train_dtype: float32
136
+ use_amp: false
137
+ log_interval: null
138
+ use_matplotlib: true
139
+ use_tensorboard: true
140
+ create_graph_in_tensorboard: false
141
+ use_wandb: false
142
+ wandb_project: null
143
+ wandb_id: null
144
+ wandb_entity: null
145
+ wandb_name: null
146
+ wandb_model_log_interval: -1
147
+ detect_anomaly: false
148
+ use_lora: false
149
+ save_lora_only: true
150
+ lora_conf: {}
151
+ pretrain_path: null
152
+ init_param: []
153
+ ignore_init_mismatch: false
154
+ freeze_param: []
155
+ num_iters_per_epoch: null
156
+ batch_size: 64
157
+ valid_batch_size: null
158
+ batch_bins: 1000000
159
+ valid_batch_bins: null
160
+ train_shape_file:
161
+ - exp/slu_stats_raw_en_word_sp/train/speech_shape
162
+ - exp/slu_stats_raw_en_word_sp/train/text_shape.word
163
+ valid_shape_file:
164
+ - exp/slu_stats_raw_en_word_sp/valid/speech_shape
165
+ - exp/slu_stats_raw_en_word_sp/valid/text_shape.word
166
+ batch_type: folded
167
+ valid_batch_type: null
168
+ fold_length:
169
+ - 80000
170
+ - 150
171
+ sort_in_batch: descending
172
+ shuffle_within_batch: false
173
+ sort_batch: descending
174
+ multiple_iterator: false
175
+ chunk_length: 500
176
+ chunk_shift_ratio: 0.5
177
+ num_cache_chunks: 1024
178
+ chunk_excluded_key_prefixes: []
179
+ chunk_default_fs: null
180
+ train_data_path_and_name_and_type:
181
+ - - dump/raw/train_sp/wav.scp
182
+ - speech
183
+ - sound
184
+ - - dump/raw/train_sp/text
185
+ - text
186
+ - text
187
+ valid_data_path_and_name_and_type:
188
+ - - dump/raw/devel/wav.scp
189
+ - speech
190
+ - sound
191
+ - - dump/raw/devel/text
192
+ - text
193
+ - text
194
+ allow_variable_data_keys: false
195
+ max_cache_size: 0.0
196
+ max_cache_fd: 32
197
+ allow_multi_rates: false
198
+ valid_max_cache_size: null
199
+ exclude_weight_decay: false
200
+ exclude_weight_decay_conf: {}
201
+ optim: adam
202
+ optim_conf:
203
+ lr: 1.0e-05
204
+ scheduler: warmuplr
205
+ scheduler_conf:
206
+ warmup_steps: 1000
207
+ token_list:
208
+ - <blank>
209
+ - <unk>
210
+ - Neutral
211
+ - Positive
212
+ - Negative
213
+ - <sos/eos>
214
+ transcript_token_list: null
215
+ two_pass: false
216
+ pre_postencoder_norm: false
217
+ init: null
218
+ input_size: 1
219
+ ctc_conf:
220
+ dropout_rate: 0.0
221
+ ctc_type: builtin
222
+ reduce: true
223
+ ignore_nan_grad: null
224
+ zero_infinity: true
225
+ brctc_risk_strategy: exp
226
+ brctc_group_strategy: end
227
+ brctc_risk_factor: 0.0
228
+ joint_net_conf: null
229
+ use_preprocessor: true
230
+ token_type: word
231
+ bpemodel: null
232
+ non_linguistic_symbols: null
233
+ cleaner: null
234
+ g2p: null
235
+ speech_volume_normalize: null
236
+ rir_scp: null
237
+ rir_apply_prob: 1.0
238
+ noise_scp: null
239
+ noise_apply_prob: 1.0
240
+ noise_db_range: '13_15'
241
+ short_noise_thres: 0.5
242
+ frontend: null
243
+ frontend_conf: {}
244
+ specaug: null
245
+ specaug_conf: {}
246
+ normalize: null
247
+ normalize_conf: {}
248
+ model: espnet
249
+ model_conf:
250
+ ctc_weight: 0.0
251
+ lsm_weight: 0.1
252
+ length_normalized_loss: false
253
+ superb_setup_encoder: true
254
+ num_class: 3
255
+ ssl_input_size: 1024
256
+ weighted_sum: true
257
+ extract_feats_in_collect_stats: false
258
+ preencoder: null
259
+ preencoder_conf: {}
260
+ encoder: whisper
261
+ encoder_conf:
262
+ whisper_model: medium
263
+ dropout_rate: 0.0
264
+ use_specaug: true
265
+ specaug_conf:
266
+ apply_time_warp: true
267
+ time_warp_window: 5
268
+ time_warp_mode: bicubic
269
+ apply_freq_mask: true
270
+ freq_mask_width_range:
271
+ - 0
272
+ - 40
273
+ num_freq_mask: 2
274
+ apply_time_mask: true
275
+ time_mask_width_ratio_range:
276
+ - 0.0
277
+ - 0.12
278
+ num_time_mask: 5
279
+ prepostencoder: null
280
+ prepostencoder_conf: {}
281
+ postencoder: null
282
+ postencoder_conf: {}
283
+ deliberationencoder: null
284
+ deliberationencoder_conf: {}
285
+ decoder: rnn
286
+ decoder_conf: {}
287
+ postdecoder: null
288
+ postdecoder_conf: {}
289
+ required:
290
+ - output_dir
291
+ - token_list
292
+ version: '202310'
293
+ distributed: true
294
+ ```
295
+
296
+ </details>
297
+
298
+
299
+
300
+ ### Citing ESPnet
301
+
302
+ ```BibTex
303
+ @inproceedings{watanabe2018espnet,
304
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
305
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
306
+ year={2018},
307
+ booktitle={Proceedings of Interspeech},
308
+ pages={2207--2211},
309
+ doi={10.21437/Interspeech.2018-1456},
310
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
311
+ }
312
+
313
+
314
+
315
+
316
+
317
+
318
+ ```
319
+
320
+ or arXiv:
321
+
322
+ ```bibtex
323
+ @misc{watanabe2018espnet,
324
+ title={ESPnet: End-to-End Speech Processing Toolkit},
325
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
326
+ year={2018},
327
+ eprint={1804.00015},
328
+ archivePrefix={arXiv},
329
+ primaryClass={cs.CL}
330
+ }
331
+ ```
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/RESULTS.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 7 23:47:19 CST 2024`
5
+ - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202310`
7
+ - pytorch version: `pytorch 2.1.0+cu121`
8
+ - Git hash: `21d2105784e4da98397bf487b2550d4c6e16d40d`
9
+ - Commit date: `Wed Jan 31 13:40:37 2024 -0600`
10
+
11
+ ## exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|1436|80.6|19.4|0.0|0.0|19.4|19.4|
17
+ |decode_asr_slu_model_valid.loss.ave/test|3426|3426|81.6|18.4|0.0|0.0|18.4|18.4|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|10365|83.1|15.2|1.8|0.9|17.8|19.4|
24
+ |decode_asr_slu_model_valid.loss.ave/test|3426|24887|84.4|13.8|1.8|0.7|16.3|18.4|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
30
+ ## exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/decode_asr_slu_model_valid.loss.ave
31
+ ### WER
32
+
33
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
34
+ |---|---|---|---|---|---|---|---|---|
35
+ |org/devel|1437|1437|80.6|19.4|0.0|0.0|19.4|19.4|
36
+
37
+ ### CER
38
+
39
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
40
+ |---|---|---|---|---|---|---|---|---|
41
+ |org/devel|1437|10372|83.1|15.2|1.8|0.9|17.8|19.4|
42
+
43
+ ### TER
44
+
45
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
46
+ |---|---|---|---|---|---|---|---|---|
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/config.yaml ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr_whisper_weighted_finetune_0.00001.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 55405
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 50
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - loss
41
+ - min
42
+ - - train
43
+ - loss
44
+ - min
45
+ keep_nbest_models: 10
46
+ nbest_averaging_interval: 0
47
+ grad_clip: 5.0
48
+ grad_clip_type: 2.0
49
+ grad_noise: false
50
+ accum_grad: 2
51
+ no_forward_run: false
52
+ resume: true
53
+ train_dtype: float32
54
+ use_amp: false
55
+ log_interval: null
56
+ use_matplotlib: true
57
+ use_tensorboard: true
58
+ create_graph_in_tensorboard: false
59
+ use_wandb: false
60
+ wandb_project: null
61
+ wandb_id: null
62
+ wandb_entity: null
63
+ wandb_name: null
64
+ wandb_model_log_interval: -1
65
+ detect_anomaly: false
66
+ use_lora: false
67
+ save_lora_only: true
68
+ lora_conf: {}
69
+ pretrain_path: null
70
+ init_param: []
71
+ ignore_init_mismatch: false
72
+ freeze_param: []
73
+ num_iters_per_epoch: null
74
+ batch_size: 64
75
+ valid_batch_size: null
76
+ batch_bins: 1000000
77
+ valid_batch_bins: null
78
+ train_shape_file:
79
+ - exp/slu_stats_raw_en_word_sp/train/speech_shape
80
+ - exp/slu_stats_raw_en_word_sp/train/text_shape.word
81
+ valid_shape_file:
82
+ - exp/slu_stats_raw_en_word_sp/valid/speech_shape
83
+ - exp/slu_stats_raw_en_word_sp/valid/text_shape.word
84
+ batch_type: folded
85
+ valid_batch_type: null
86
+ fold_length:
87
+ - 80000
88
+ - 150
89
+ sort_in_batch: descending
90
+ shuffle_within_batch: false
91
+ sort_batch: descending
92
+ multiple_iterator: false
93
+ chunk_length: 500
94
+ chunk_shift_ratio: 0.5
95
+ num_cache_chunks: 1024
96
+ chunk_excluded_key_prefixes: []
97
+ chunk_default_fs: null
98
+ train_data_path_and_name_and_type:
99
+ - - dump/raw/train_sp/wav.scp
100
+ - speech
101
+ - sound
102
+ - - dump/raw/train_sp/text
103
+ - text
104
+ - text
105
+ valid_data_path_and_name_and_type:
106
+ - - dump/raw/devel/wav.scp
107
+ - speech
108
+ - sound
109
+ - - dump/raw/devel/text
110
+ - text
111
+ - text
112
+ allow_variable_data_keys: false
113
+ max_cache_size: 0.0
114
+ max_cache_fd: 32
115
+ allow_multi_rates: false
116
+ valid_max_cache_size: null
117
+ exclude_weight_decay: false
118
+ exclude_weight_decay_conf: {}
119
+ optim: adam
120
+ optim_conf:
121
+ lr: 1.0e-05
122
+ scheduler: warmuplr
123
+ scheduler_conf:
124
+ warmup_steps: 1000
125
+ token_list:
126
+ - <blank>
127
+ - <unk>
128
+ - Neutral
129
+ - Positive
130
+ - Negative
131
+ - <sos/eos>
132
+ transcript_token_list: null
133
+ two_pass: false
134
+ pre_postencoder_norm: false
135
+ init: null
136
+ input_size: 1
137
+ ctc_conf:
138
+ dropout_rate: 0.0
139
+ ctc_type: builtin
140
+ reduce: true
141
+ ignore_nan_grad: null
142
+ zero_infinity: true
143
+ brctc_risk_strategy: exp
144
+ brctc_group_strategy: end
145
+ brctc_risk_factor: 0.0
146
+ joint_net_conf: null
147
+ use_preprocessor: true
148
+ token_type: word
149
+ bpemodel: null
150
+ non_linguistic_symbols: null
151
+ cleaner: null
152
+ g2p: null
153
+ speech_volume_normalize: null
154
+ rir_scp: null
155
+ rir_apply_prob: 1.0
156
+ noise_scp: null
157
+ noise_apply_prob: 1.0
158
+ noise_db_range: '13_15'
159
+ short_noise_thres: 0.5
160
+ frontend: null
161
+ frontend_conf: {}
162
+ specaug: null
163
+ specaug_conf: {}
164
+ normalize: null
165
+ normalize_conf: {}
166
+ model: espnet
167
+ model_conf:
168
+ ctc_weight: 0.0
169
+ lsm_weight: 0.1
170
+ length_normalized_loss: false
171
+ superb_setup_encoder: true
172
+ num_class: 3
173
+ ssl_input_size: 1024
174
+ weighted_sum: true
175
+ extract_feats_in_collect_stats: false
176
+ preencoder: null
177
+ preencoder_conf: {}
178
+ encoder: whisper
179
+ encoder_conf:
180
+ whisper_model: medium
181
+ dropout_rate: 0.0
182
+ use_specaug: true
183
+ specaug_conf:
184
+ apply_time_warp: true
185
+ time_warp_window: 5
186
+ time_warp_mode: bicubic
187
+ apply_freq_mask: true
188
+ freq_mask_width_range:
189
+ - 0
190
+ - 40
191
+ num_freq_mask: 2
192
+ apply_time_mask: true
193
+ time_mask_width_ratio_range:
194
+ - 0.0
195
+ - 0.12
196
+ num_time_mask: 5
197
+ prepostencoder: null
198
+ prepostencoder_conf: {}
199
+ postencoder: null
200
+ postencoder_conf: {}
201
+ deliberationencoder: null
202
+ deliberationencoder_conf: {}
203
+ decoder: rnn
204
+ decoder_conf: {}
205
+ postdecoder: null
206
+ postdecoder_conf: {}
207
+ required:
208
+ - output_dir
209
+ - token_list
210
+ version: '202310'
211
+ distributed: true
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/acc.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/backward_time.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/cer.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/clip.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/forward_time.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/grad_norm.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/iter_time.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss_att.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/loss_scale.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/optim0_lr0.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/optim_step_time.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/train_time.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/images/wer.png ADDED
exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/valid.loss.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79e70c00450663b8fa6b369d0c797c5dddbe94057d18873ad1e24d6361db1c6a
3
+ size 1233216410
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202310'
2
+ files:
3
+ slu_model_file: exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/valid.loss.ave_10best.pth
4
+ python: "3.9.13 (main, Aug 25 2022, 23:26:10) \n[GCC 11.2.0]"
5
+ timestamp: 1715350211.108197
6
+ torch: 2.1.0+cu121
7
+ yaml_files:
8
+ slu_train_config: exp/slu_train_asr_whisper_weighted_finetune_0.00001_raw_en_word_sp/config.yaml