lichenda commited on
Commit
1d83301
1 Parent(s): 897786b

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,249 @@
1
  ---
2
- license: cc
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - wsj0_2mix
9
+ license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `lichenda/Chenda_Li_wsj0_2mix_enh_dprnn_tasnet`
15
+
16
+ This model was trained by LiChenda using wsj0_2mix recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 54919e2529d6f58f4550d4a72960f57b83f66dc9
23
+ pip install -e .
24
+ cd egs2/wsj0_2mix/enh1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model lichenda/Chenda_Li_wsj0_2mix_enh_dprnn_tasnet
26
+ ```
27
+
28
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Thu Apr 15 00:03:19 CST 2021`
32
+ - python version: `3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]`
33
+ - espnet version: `espnet 0.9.8`
34
+ - pytorch version: `pytorch 1.5.0`
35
+ - Git hash: `2aa2f151b5929dc9ffa4df39a8d8c26ca4dbdb85`
36
+ - Commit date: `Tue Mar 30 09:08:27 2021 +0900`
37
+
38
+
39
+ ## enh_train_enh_dprnn_tasnet_raw
40
+
41
+ config: conf/tuning/train_enh_dprnn_tasnet.yaml
42
+
43
+ |dataset|STOI|SAR|SDR|SIR|
44
+ |---|---|---|---|---|
45
+ |enhanced_cv_min_8k|0.960037|19.0476|18.5438|29.1591|
46
+ |enhanced_tt_min_8k|0.968376|18.8209|18.2925|28.929|
47
+
48
+ ## ENH config
49
+
50
+ <details><summary>expand</summary>
51
+
52
+ ```
53
+ config: conf/tuning/train_enh_dprnn_tasnet.yaml
54
+ print_config: false
55
+ log_level: INFO
56
+ dry_run: false
57
+ iterator_type: chunk
58
+ output_dir: exp/enh_train_enh_dprnn_tasnet_raw
59
+ ngpu: 1
60
+ seed: 0
61
+ num_workers: 4
62
+ num_att_plot: 3
63
+ dist_backend: nccl
64
+ dist_init_method: env://
65
+ dist_world_size: 4
66
+ dist_rank: 0
67
+ local_rank: 0
68
+ dist_master_addr: localhost
69
+ dist_master_port: 45126
70
+ dist_launcher: null
71
+ multiprocessing_distributed: true
72
+ unused_parameters: false
73
+ sharded_ddp: false
74
+ cudnn_enabled: true
75
+ cudnn_benchmark: false
76
+ cudnn_deterministic: true
77
+ collect_stats: false
78
+ write_collected_feats: false
79
+ max_epoch: 150
80
+ patience: 4
81
+ val_scheduler_criterion:
82
+ - valid
83
+ - loss
84
+ early_stopping_criterion:
85
+ - valid
86
+ - loss
87
+ - min
88
+ best_model_criterion:
89
+ - - valid
90
+ - si_snr
91
+ - max
92
+ - - valid
93
+ - loss
94
+ - min
95
+ keep_nbest_models: 1
96
+ grad_clip: 5.0
97
+ grad_clip_type: 2.0
98
+ grad_noise: false
99
+ accum_grad: 1
100
+ no_forward_run: false
101
+ resume: true
102
+ train_dtype: float32
103
+ use_amp: false
104
+ log_interval: null
105
+ use_tensorboard: true
106
+ use_wandb: false
107
+ wandb_project: null
108
+ wandb_id: null
109
+ detect_anomaly: false
110
+ pretrain_path: null
111
+ init_param: []
112
+ freeze_param: []
113
+ num_iters_per_epoch: null
114
+ batch_size: 4
115
+ valid_batch_size: null
116
+ batch_bins: 1000000
117
+ valid_batch_bins: null
118
+ train_shape_file:
119
+ - exp/enh_stats_8k/train/speech_mix_shape
120
+ - exp/enh_stats_8k/train/speech_ref1_shape
121
+ - exp/enh_stats_8k/train/speech_ref2_shape
122
+ valid_shape_file:
123
+ - exp/enh_stats_8k/valid/speech_mix_shape
124
+ - exp/enh_stats_8k/valid/speech_ref1_shape
125
+ - exp/enh_stats_8k/valid/speech_ref2_shape
126
+ batch_type: folded
127
+ valid_batch_type: null
128
+ fold_length:
129
+ - 80000
130
+ - 80000
131
+ - 80000
132
+ sort_in_batch: descending
133
+ sort_batch: descending
134
+ multiple_iterator: false
135
+ chunk_length: 32000
136
+ chunk_shift_ratio: 0.5
137
+ num_cache_chunks: 1024
138
+ train_data_path_and_name_and_type:
139
+ - - dump/raw/tr_min_8k/wav.scp
140
+ - speech_mix
141
+ - sound
142
+ - - dump/raw/tr_min_8k/spk1.scp
143
+ - speech_ref1
144
+ - sound
145
+ - - dump/raw/tr_min_8k/spk2.scp
146
+ - speech_ref2
147
+ - sound
148
+ valid_data_path_and_name_and_type:
149
+ - - dump/raw/cv_min_8k/wav.scp
150
+ - speech_mix
151
+ - sound
152
+ - - dump/raw/cv_min_8k/spk1.scp
153
+ - speech_ref1
154
+ - sound
155
+ - - dump/raw/cv_min_8k/spk2.scp
156
+ - speech_ref2
157
+ - sound
158
+ allow_variable_data_keys: false
159
+ max_cache_size: 0.0
160
+ max_cache_fd: 32
161
+ valid_max_cache_size: null
162
+ optim: adam
163
+ optim_conf:
164
+ lr: 0.001
165
+ eps: 1.0e-08
166
+ weight_decay: 0
167
+ scheduler: reducelronplateau
168
+ scheduler_conf:
169
+ mode: min
170
+ factor: 0.7
171
+ patience: 1
172
+ init: xavier_uniform
173
+ model_conf:
174
+ loss_type: si_snr
175
+ use_preprocessor: false
176
+ encoder: conv
177
+ encoder_conf:
178
+ channel: 64
179
+ kernel_size: 2
180
+ stride: 1
181
+ separator: dprnn
182
+ separator_conf:
183
+ num_spk: 2
184
+ layer: 6
185
+ rnn_type: lstm
186
+ bidirectional: true
187
+ nonlinear: relu
188
+ unit: 128
189
+ segment_size: 250
190
+ dropout: 0.1
191
+ decoder: conv
192
+ decoder_conf:
193
+ channel: 64
194
+ kernel_size: 2
195
+ stride: 1
196
+ required:
197
+ - output_dir
198
+ version: 0.9.8
199
+ distributed: true
200
+ ```
201
+
202
+ </details>
203
+
204
+
205
+
206
+ ### Citing ESPnet
207
+
208
+ ```BibTex
209
+ @inproceedings{watanabe2018espnet,
210
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
211
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
212
+ year={2018},
213
+ booktitle={Proceedings of Interspeech},
214
+ pages={2207--2211},
215
+ doi={10.21437/Interspeech.2018-1456},
216
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
217
+ }
218
+
219
+
220
+ @inproceedings{ESPnet-SE,
221
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
222
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
223
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
224
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
225
+ pages = {785--792},
226
+ publisher = {{IEEE}},
227
+ year = {2021},
228
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
229
+ doi = {10.1109/SLT48900.2021.9383615},
230
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
231
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
232
+ bibsource = {dblp computer science bibliography, https://dblp.org}
233
+ }
234
+
235
+
236
+ ```
237
+
238
+ or arXiv:
239
+
240
+ ```bibtex
241
+ @misc{watanabe2018espnet,
242
+ title={ESPnet: End-to-End Speech Processing Toolkit},
243
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
244
+ year={2018},
245
+ eprint={1804.00015},
246
+ archivePrefix={arXiv},
247
+ primaryClass={cs.CL}
248
+ }
249
+ ```
exp/enh_stats_8k/train/feats_stats.npz ADDED
Binary file (778 Bytes). View file
exp/enh_train_enh_dprnn_tasnet_raw/96epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:338bc12bf9db30b178247f8b0b3ecbc24b1eff7739c4771f01aaaf1d456c5212
3
+ size 10393743
exp/enh_train_enh_dprnn_tasnet_raw/RESULTS.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Thu Apr 15 00:03:19 CST 2021`
5
+ - python version: `3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]`
6
+ - espnet version: `espnet 0.9.8`
7
+ - pytorch version: `pytorch 1.5.0`
8
+ - Git hash: `2aa2f151b5929dc9ffa4df39a8d8c26ca4dbdb85`
9
+ - Commit date: `Tue Mar 30 09:08:27 2021 +0900`
10
+
11
+
12
+ ## enh_train_enh_dprnn_tasnet_raw
13
+
14
+ config: conf/tuning/train_enh_dprnn_tasnet.yaml
15
+
16
+ |dataset|STOI|SAR|SDR|SIR|
17
+ |---|---|---|---|---|
18
+ |enhanced_cv_min_8k|0.960037|19.0476|18.5438|29.1591|
19
+ |enhanced_tt_min_8k|0.968376|18.8209|18.2925|28.929|
20
+
exp/enh_train_enh_dprnn_tasnet_raw/config.yaml ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_dprnn_tasnet.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp/enh_train_enh_dprnn_tasnet_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 4
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 45126
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 150
28
+ patience: 4
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - si_snr
39
+ - max
40
+ - - valid
41
+ - loss
42
+ - min
43
+ keep_nbest_models: 1
44
+ grad_clip: 5.0
45
+ grad_clip_type: 2.0
46
+ grad_noise: false
47
+ accum_grad: 1
48
+ no_forward_run: false
49
+ resume: true
50
+ train_dtype: float32
51
+ use_amp: false
52
+ log_interval: null
53
+ use_tensorboard: true
54
+ use_wandb: false
55
+ wandb_project: null
56
+ wandb_id: null
57
+ detect_anomaly: false
58
+ pretrain_path: null
59
+ init_param: []
60
+ freeze_param: []
61
+ num_iters_per_epoch: null
62
+ batch_size: 4
63
+ valid_batch_size: null
64
+ batch_bins: 1000000
65
+ valid_batch_bins: null
66
+ train_shape_file:
67
+ - exp/enh_stats_8k/train/speech_mix_shape
68
+ - exp/enh_stats_8k/train/speech_ref1_shape
69
+ - exp/enh_stats_8k/train/speech_ref2_shape
70
+ valid_shape_file:
71
+ - exp/enh_stats_8k/valid/speech_mix_shape
72
+ - exp/enh_stats_8k/valid/speech_ref1_shape
73
+ - exp/enh_stats_8k/valid/speech_ref2_shape
74
+ batch_type: folded
75
+ valid_batch_type: null
76
+ fold_length:
77
+ - 80000
78
+ - 80000
79
+ - 80000
80
+ sort_in_batch: descending
81
+ sort_batch: descending
82
+ multiple_iterator: false
83
+ chunk_length: 32000
84
+ chunk_shift_ratio: 0.5
85
+ num_cache_chunks: 1024
86
+ train_data_path_and_name_and_type:
87
+ - - dump/raw/tr_min_8k/wav.scp
88
+ - speech_mix
89
+ - sound
90
+ - - dump/raw/tr_min_8k/spk1.scp
91
+ - speech_ref1
92
+ - sound
93
+ - - dump/raw/tr_min_8k/spk2.scp
94
+ - speech_ref2
95
+ - sound
96
+ valid_data_path_and_name_and_type:
97
+ - - dump/raw/cv_min_8k/wav.scp
98
+ - speech_mix
99
+ - sound
100
+ - - dump/raw/cv_min_8k/spk1.scp
101
+ - speech_ref1
102
+ - sound
103
+ - - dump/raw/cv_min_8k/spk2.scp
104
+ - speech_ref2
105
+ - sound
106
+ allow_variable_data_keys: false
107
+ max_cache_size: 0.0
108
+ max_cache_fd: 32
109
+ valid_max_cache_size: null
110
+ optim: adam
111
+ optim_conf:
112
+ lr: 0.001
113
+ eps: 1.0e-08
114
+ weight_decay: 0
115
+ scheduler: reducelronplateau
116
+ scheduler_conf:
117
+ mode: min
118
+ factor: 0.7
119
+ patience: 1
120
+ init: xavier_uniform
121
+ model_conf:
122
+ loss_type: si_snr
123
+ use_preprocessor: false
124
+ encoder: conv
125
+ encoder_conf:
126
+ channel: 64
127
+ kernel_size: 2
128
+ stride: 1
129
+ separator: dprnn
130
+ separator_conf:
131
+ num_spk: 2
132
+ layer: 6
133
+ rnn_type: lstm
134
+ bidirectional: true
135
+ nonlinear: relu
136
+ unit: 128
137
+ segment_size: 250
138
+ dropout: 0.1
139
+ decoder: conv
140
+ decoder_conf:
141
+ channel: 64
142
+ kernel_size: 2
143
+ stride: 1
144
+ required:
145
+ - output_dir
146
+ version: 0.9.8
147
+ distributed: true
exp/enh_train_enh_dprnn_tasnet_raw/images/backward_time.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/forward_time.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/gpu_max_cached_mem_GB.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/iter_time.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/loss.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/optim0_lr0.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/optim_step_time.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/si_snr.png ADDED
exp/enh_train_enh_dprnn_tasnet_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.7a1
2
+ files:
3
+ model_file: exp/enh_train_enh_dprnn_tasnet_raw/96epoch.pth
4
+ python: "3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]"
5
+ timestamp: 1649682775.265407
6
+ torch: 1.8.1
7
+ yaml_files:
8
+ train_config: exp/enh_train_enh_dprnn_tasnet_raw/config.yaml