jungjee commited on
Commit
c59321c
1 Parent(s): 9a950c7

Update model

Browse files
Files changed (23) hide show
  1. README.md +291 -0
  2. meta.yaml +8 -0
  3. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/18epoch.pth +3 -0
  4. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/RESULTS.md +17 -0
  5. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/config.yaml +200 -0
  6. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/backward_time.png +0 -0
  7. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/clip.png +0 -0
  8. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/eer.png +0 -0
  9. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/forward_time.png +0 -0
  10. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/gpu_max_cached_mem_GB.png +0 -0
  11. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/grad_norm.png +0 -0
  12. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/iter_time.png +0 -0
  13. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/loss.png +0 -0
  14. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/loss_scale.png +0 -0
  15. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/mindcf.png +0 -0
  16. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/n_trials.png +0 -0
  17. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/nontrg_mean.png +0 -0
  18. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/nontrg_std.png +0 -0
  19. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/optim0_lr0.png +0 -0
  20. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/optim_step_time.png +0 -0
  21. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/train_time.png +0 -0
  22. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/trg_mean.png +0 -0
  23. save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/trg_std.png +0 -0
README.md ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - speaker-recognition
6
+ language: multilingual
7
+ datasets:
8
+ - voxceleb
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 SPK model
13
+
14
+ ### `espnet/voxcelebs12_ska_wavlm_frozen`
15
+
16
+ This model was trained by Jungjee using voxceleb recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout ea74d1c7482bf5b3b4f90410d1ca8521fd9a566b
26
+ pip install -e .
27
+ cd egs2/voxceleb/spk1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/voxcelebs12_ska_wavlm_frozen
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_spk_result.py -->
32
+ # RESULTS
33
+ ## Environments
34
+ date: 2024-01-01 15:49:24.125685
35
+
36
+ - python version: 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0]
37
+ - espnet version: 202310
38
+ - pytorch version: 2.0.1
39
+
40
+ | | Mean | Std |
41
+ |---|---|---|
42
+ | Target | 8.1076 | 3.4943 |
43
+ | Non-target | 2.1763 | 2.1763 |
44
+
45
+ | Model name | EER(%) | minDCF |
46
+ |---|---|---|
47
+ | conf/tuning/train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample | 0.564 | 0.05488 |
48
+
49
+ ## SPK config
50
+
51
+ <details><summary>expand</summary>
52
+
53
+ ```
54
+ config: conf/tuning/train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample.yaml
55
+ print_config: false
56
+ log_level: INFO
57
+ drop_last_iter: true
58
+ dry_run: false
59
+ iterator_type: category
60
+ valid_iterator_type: sequence
61
+ output_dir: exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp
62
+ ngpu: 1
63
+ seed: 0
64
+ num_workers: 6
65
+ num_att_plot: 0
66
+ dist_backend: nccl
67
+ dist_init_method: env://
68
+ dist_world_size: 4
69
+ dist_rank: 0
70
+ local_rank: 0
71
+ dist_master_addr: localhost
72
+ dist_master_port: 49631
73
+ dist_launcher: null
74
+ multiprocessing_distributed: true
75
+ unused_parameters: false
76
+ sharded_ddp: false
77
+ cudnn_enabled: true
78
+ cudnn_benchmark: true
79
+ cudnn_deterministic: false
80
+ collect_stats: false
81
+ write_collected_feats: false
82
+ max_epoch: 40
83
+ patience: null
84
+ val_scheduler_criterion:
85
+ - valid
86
+ - loss
87
+ early_stopping_criterion:
88
+ - valid
89
+ - loss
90
+ - min
91
+ best_model_criterion:
92
+ - - valid
93
+ - eer
94
+ - min
95
+ keep_nbest_models: 3
96
+ nbest_averaging_interval: 0
97
+ grad_clip: 9999
98
+ grad_clip_type: 2.0
99
+ grad_noise: false
100
+ accum_grad: 8
101
+ no_forward_run: false
102
+ resume: true
103
+ train_dtype: float32
104
+ use_amp: true
105
+ log_interval: 100
106
+ use_matplotlib: true
107
+ use_tensorboard: true
108
+ create_graph_in_tensorboard: false
109
+ use_wandb: false
110
+ wandb_project: null
111
+ wandb_id: null
112
+ wandb_entity: null
113
+ wandb_name: null
114
+ wandb_model_log_interval: -1
115
+ detect_anomaly: false
116
+ use_lora: false
117
+ save_lora_only: true
118
+ lora_conf: {}
119
+ pretrain_path: null
120
+ init_param: []
121
+ ignore_init_mismatch: false
122
+ freeze_param:
123
+ - frontend.upstream
124
+ num_iters_per_epoch: null
125
+ batch_size: 64
126
+ valid_batch_size: 5
127
+ batch_bins: 1000000
128
+ valid_batch_bins: null
129
+ train_shape_file:
130
+ - exp/spk_stats_16k_sp/train/speech_shape
131
+ valid_shape_file:
132
+ - exp/spk_stats_16k_sp/valid/speech_shape
133
+ batch_type: folded
134
+ valid_batch_type: null
135
+ fold_length:
136
+ - 120000
137
+ sort_in_batch: descending
138
+ shuffle_within_batch: false
139
+ sort_batch: descending
140
+ multiple_iterator: false
141
+ chunk_length: 500
142
+ chunk_shift_ratio: 0.5
143
+ num_cache_chunks: 1024
144
+ chunk_excluded_key_prefixes: []
145
+ chunk_default_fs: null
146
+ train_data_path_and_name_and_type:
147
+ - - dump/raw/voxceleb12_devs_sp/wav.scp
148
+ - speech
149
+ - sound
150
+ - - dump/raw/voxceleb12_devs_sp/utt2spk
151
+ - spk_labels
152
+ - text
153
+ valid_data_path_and_name_and_type:
154
+ - - dump/raw/voxceleb1_test/trial.scp
155
+ - speech
156
+ - sound
157
+ - - dump/raw/voxceleb1_test/trial2.scp
158
+ - speech2
159
+ - sound
160
+ - - dump/raw/voxceleb1_test/trial_label
161
+ - spk_labels
162
+ - text
163
+ allow_variable_data_keys: false
164
+ max_cache_size: 0.0
165
+ max_cache_fd: 32
166
+ allow_multi_rates: false
167
+ valid_max_cache_size: null
168
+ exclude_weight_decay: false
169
+ exclude_weight_decay_conf: {}
170
+ optim: adam
171
+ optim_conf:
172
+ lr: 0.001
173
+ weight_decay: 5.0e-05
174
+ amsgrad: false
175
+ scheduler: cosineannealingwarmuprestarts
176
+ scheduler_conf:
177
+ first_cycle_steps: 71280
178
+ cycle_mult: 1.0
179
+ max_lr: 0.001
180
+ min_lr: 5.0e-06
181
+ warmup_steps: 1000
182
+ gamma: 0.75
183
+ init: null
184
+ use_preprocessor: true
185
+ input_size: null
186
+ target_duration: 3.0
187
+ spk2utt: dump/raw/voxceleb12_devs_sp/spk2utt
188
+ spk_num: 21615
189
+ sample_rate: 16000
190
+ num_eval: 10
191
+ rir_scp: ''
192
+ model_conf:
193
+ extract_feats_in_collect_stats: false
194
+ frontend: s3prl
195
+ frontend_conf:
196
+ frontend_conf:
197
+ upstream: wavlm_large
198
+ download_dir: ./hub
199
+ multilayer_feature: true
200
+ specaug: null
201
+ specaug_conf: {}
202
+ normalize: utterance_mvn
203
+ normalize_conf:
204
+ norm_vars: false
205
+ encoder: ska_tdnn
206
+ encoder_conf:
207
+ model_scale: 8
208
+ ndim: 1024
209
+ ska_dim: 128
210
+ output_size: 1536
211
+ pooling: chn_attn_stat
212
+ pooling_conf: {}
213
+ projector: ska_tdnn
214
+ projector_conf:
215
+ output_size: 192
216
+ preprocessor: spk
217
+ preprocessor_conf:
218
+ target_duration: 3.0
219
+ sample_rate: 16000
220
+ num_eval: 5
221
+ noise_apply_prob: 0.5
222
+ noise_info:
223
+ - - 1.0
224
+ - dump/raw/musan_speech.scp
225
+ - - 4
226
+ - 7
227
+ - - 13
228
+ - 20
229
+ - - 1.0
230
+ - dump/raw/musan_noise.scp
231
+ - - 1
232
+ - 1
233
+ - - 0
234
+ - 15
235
+ - - 1.0
236
+ - dump/raw/musan_music.scp
237
+ - - 1
238
+ - 1
239
+ - - 5
240
+ - 15
241
+ rir_apply_prob: 0.5
242
+ rir_scp: dump/raw/rirs.scp
243
+ loss: aamsoftmax_sc_topk
244
+ loss_conf:
245
+ margin: 0.3
246
+ scale: 30
247
+ K: 3
248
+ mp: 0.06
249
+ k_top: 5
250
+ required:
251
+ - output_dir
252
+ version: '202310'
253
+ distributed: true
254
+ ```
255
+
256
+ </details>
257
+
258
+
259
+
260
+ ### Citing ESPnet
261
+
262
+ ```BibTex
263
+ @inproceedings{watanabe2018espnet,
264
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
265
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
266
+ year={2018},
267
+ booktitle={Proceedings of Interspeech},
268
+ pages={2207--2211},
269
+ doi={10.21437/Interspeech.2018-1456},
270
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
271
+ }
272
+
273
+
274
+
275
+
276
+
277
+
278
+ ```
279
+
280
+ or arXiv:
281
+
282
+ ```bibtex
283
+ @misc{watanabe2018espnet,
284
+ title={ESPnet: End-to-End Speech Processing Toolkit},
285
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
286
+ year={2018},
287
+ eprint={1804.00015},
288
+ archivePrefix={arXiv},
289
+ primaryClass={cs.CL}
290
+ }
291
+ ```
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202310'
2
+ files:
3
+ model_file: save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/18epoch.pth
4
+ python: "3.9.16 (main, Mar 8 2023, 14:00:05) \n[GCC 11.2.0]"
5
+ timestamp: 1704234993.142382
6
+ torch: 2.0.1
7
+ yaml_files:
8
+ train_config: save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/config.yaml
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/18epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03d6059cb59494055afdfd51d62ad5020d1e4b292e815a51be99aaf00fce6c48
3
+ size 2074845418
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/RESULTS.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_spk_result.py -->
2
+ # RESULTS
3
+ ## Environments
4
+ date: 2024-01-01 15:49:24.125685
5
+
6
+ - python version: 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0]
7
+ - espnet version: 202310
8
+ - pytorch version: 2.0.1
9
+
10
+ | | Mean | Std |
11
+ |---|---|---|
12
+ | Target | 8.1076 | 3.4943 |
13
+ | Non-target | 2.1763 | 2.1763 |
14
+
15
+ | Model name | EER(%) | minDCF |
16
+ |---|---|---|
17
+ | conf/tuning/train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample | 0.564 | 0.05488 |
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/config.yaml ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: true
5
+ dry_run: false
6
+ iterator_type: category
7
+ valid_iterator_type: sequence
8
+ output_dir: exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 6
12
+ num_att_plot: 0
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 49631
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: true
26
+ cudnn_deterministic: false
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 40
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - eer
41
+ - min
42
+ keep_nbest_models: 3
43
+ nbest_averaging_interval: 0
44
+ grad_clip: 9999
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 8
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: true
52
+ log_interval: 100
53
+ use_matplotlib: true
54
+ use_tensorboard: true
55
+ create_graph_in_tensorboard: false
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ use_lora: false
64
+ save_lora_only: true
65
+ lora_conf: {}
66
+ pretrain_path: null
67
+ init_param: []
68
+ ignore_init_mismatch: false
69
+ freeze_param:
70
+ - frontend.upstream
71
+ num_iters_per_epoch: null
72
+ batch_size: 64
73
+ valid_batch_size: 5
74
+ batch_bins: 1000000
75
+ valid_batch_bins: null
76
+ train_shape_file:
77
+ - exp/spk_stats_16k_sp/train/speech_shape
78
+ valid_shape_file:
79
+ - exp/spk_stats_16k_sp/valid/speech_shape
80
+ batch_type: folded
81
+ valid_batch_type: null
82
+ fold_length:
83
+ - 120000
84
+ sort_in_batch: descending
85
+ shuffle_within_batch: false
86
+ sort_batch: descending
87
+ multiple_iterator: false
88
+ chunk_length: 500
89
+ chunk_shift_ratio: 0.5
90
+ num_cache_chunks: 1024
91
+ chunk_excluded_key_prefixes: []
92
+ chunk_default_fs: null
93
+ train_data_path_and_name_and_type:
94
+ - - dump/raw/voxceleb12_devs_sp/wav.scp
95
+ - speech
96
+ - sound
97
+ - - dump/raw/voxceleb12_devs_sp/utt2spk
98
+ - spk_labels
99
+ - text
100
+ valid_data_path_and_name_and_type:
101
+ - - dump/raw/voxceleb1_test/trial.scp
102
+ - speech
103
+ - sound
104
+ - - dump/raw/voxceleb1_test/trial2.scp
105
+ - speech2
106
+ - sound
107
+ - - dump/raw/voxceleb1_test/trial_label
108
+ - spk_labels
109
+ - text
110
+ allow_variable_data_keys: false
111
+ max_cache_size: 0.0
112
+ max_cache_fd: 32
113
+ allow_multi_rates: false
114
+ valid_max_cache_size: null
115
+ exclude_weight_decay: false
116
+ exclude_weight_decay_conf: {}
117
+ optim: adam
118
+ optim_conf:
119
+ lr: 0.001
120
+ weight_decay: 5.0e-05
121
+ amsgrad: false
122
+ scheduler: cosineannealingwarmuprestarts
123
+ scheduler_conf:
124
+ first_cycle_steps: 71280
125
+ cycle_mult: 1.0
126
+ max_lr: 0.001
127
+ min_lr: 5.0e-06
128
+ warmup_steps: 1000
129
+ gamma: 0.75
130
+ init: null
131
+ use_preprocessor: true
132
+ input_size: null
133
+ target_duration: 3.0
134
+ spk2utt: dump/raw/voxceleb12_devs_sp/spk2utt
135
+ spk_num: 21615
136
+ sample_rate: 16000
137
+ num_eval: 10
138
+ rir_scp: ''
139
+ model_conf:
140
+ extract_feats_in_collect_stats: false
141
+ frontend: s3prl
142
+ frontend_conf:
143
+ frontend_conf:
144
+ upstream: wavlm_large
145
+ download_dir: ./hub
146
+ multilayer_feature: true
147
+ specaug: null
148
+ specaug_conf: {}
149
+ normalize: utterance_mvn
150
+ normalize_conf:
151
+ norm_vars: false
152
+ encoder: ska_tdnn
153
+ encoder_conf:
154
+ model_scale: 8
155
+ ndim: 1024
156
+ ska_dim: 128
157
+ output_size: 1536
158
+ pooling: chn_attn_stat
159
+ pooling_conf: {}
160
+ projector: ska_tdnn
161
+ projector_conf:
162
+ output_size: 192
163
+ preprocessor: spk
164
+ preprocessor_conf:
165
+ target_duration: 3.0
166
+ sample_rate: 16000
167
+ num_eval: 5
168
+ noise_apply_prob: 0.5
169
+ noise_info:
170
+ - - 1.0
171
+ - dump/raw/musan_speech.scp
172
+ - - 4
173
+ - 7
174
+ - - 13
175
+ - 20
176
+ - - 1.0
177
+ - dump/raw/musan_noise.scp
178
+ - - 1
179
+ - 1
180
+ - - 0
181
+ - 15
182
+ - - 1.0
183
+ - dump/raw/musan_music.scp
184
+ - - 1
185
+ - 1
186
+ - - 5
187
+ - 15
188
+ rir_apply_prob: 0.5
189
+ rir_scp: dump/raw/rirs.scp
190
+ loss: aamsoftmax_sc_topk
191
+ loss_conf:
192
+ margin: 0.3
193
+ scale: 30
194
+ K: 3
195
+ mp: 0.06
196
+ k_top: 5
197
+ required:
198
+ - output_dir
199
+ version: '202310'
200
+ distributed: true
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/backward_time.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/clip.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/eer.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/forward_time.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/gpu_max_cached_mem_GB.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/grad_norm.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/iter_time.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/loss.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/loss_scale.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/mindcf.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/n_trials.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/nontrg_mean.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/nontrg_std.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/optim0_lr0.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/optim_step_time.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/train_time.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/trg_mean.png ADDED
save_exp/spk_train_ska_Vox12_emb192_torchmelspec_subcentertopk_wavlm_nodownsample_raw_sp/images/trg_std.png ADDED