Wangyou Zhang commited on
Commit
6c78535
1 Parent(s): 5cfcadc

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,241 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language:
7
+ datasets:
8
+ - chime4
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### ``
15
+
16
+ This model was trained by Wangyou Zhang using chime4 recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+
23
+ pip install -e .
24
+ cd wyz/Downloads/Wangyou_Zhang_chime4_enh_train_enh_beamformer_mvdr_raw
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model
26
+ ```
27
+
28
+
29
+
30
+ ## ENH config
31
+
32
+ <details><summary>expand</summary>
33
+
34
+ ```
35
+ config: conf/tuning/train_enh_beamformer_mvdr.yaml
36
+ print_config: false
37
+ log_level: INFO
38
+ dry_run: false
39
+ iterator_type: sequence
40
+ output_dir: exp/enh_train_enh_beamformer_mvdr_raw
41
+ ngpu: 1
42
+ seed: 0
43
+ num_workers: 4
44
+ num_att_plot: 3
45
+ dist_backend: nccl
46
+ dist_init_method: env://
47
+ dist_world_size: 2
48
+ dist_rank: 0
49
+ local_rank: 0
50
+ dist_master_addr: localhost
51
+ dist_master_port: 35841
52
+ dist_launcher: null
53
+ multiprocessing_distributed: true
54
+ cudnn_enabled: true
55
+ cudnn_benchmark: false
56
+ cudnn_deterministic: true
57
+ collect_stats: false
58
+ write_collected_feats: false
59
+ max_epoch: 70
60
+ patience: 4
61
+ val_scheduler_criterion:
62
+ - valid
63
+ - loss
64
+ early_stopping_criterion:
65
+ - valid
66
+ - loss
67
+ - min
68
+ best_model_criterion:
69
+ - - valid
70
+ - si_snr
71
+ - max
72
+ - - valid
73
+ - loss
74
+ - min
75
+ keep_nbest_models: 1
76
+ grad_clip: 5.0
77
+ grad_clip_type: 2.0
78
+ grad_noise: false
79
+ accum_grad: 1
80
+ no_forward_run: false
81
+ resume: true
82
+ train_dtype: float32
83
+ use_amp: false
84
+ log_interval: null
85
+ unused_parameters: false
86
+ use_tensorboard: true
87
+ use_wandb: false
88
+ wandb_project: null
89
+ wandb_id: null
90
+ pretrain_path: null
91
+ init_param: []
92
+ freeze_param: []
93
+ num_iters_per_epoch: null
94
+ batch_size: 8
95
+ valid_batch_size: null
96
+ batch_bins: 1000000
97
+ valid_batch_bins: null
98
+ train_shape_file:
99
+ - exp/enh_stats_16k/train/speech_mix_shape
100
+ - exp/enh_stats_16k/train/speech_ref1_shape
101
+ - exp/enh_stats_16k/train/noise_ref1_shape
102
+ valid_shape_file:
103
+ - exp/enh_stats_16k/valid/speech_mix_shape
104
+ - exp/enh_stats_16k/valid/speech_ref1_shape
105
+ - exp/enh_stats_16k/valid/noise_ref1_shape
106
+ batch_type: folded
107
+ valid_batch_type: null
108
+ fold_length:
109
+ - 80000
110
+ - 80000
111
+ - 80000
112
+ sort_in_batch: descending
113
+ sort_batch: descending
114
+ multiple_iterator: false
115
+ chunk_length: 500
116
+ chunk_shift_ratio: 0.5
117
+ num_cache_chunks: 1024
118
+ train_data_path_and_name_and_type:
119
+ - - dump/raw/tr05_simu_isolated_6ch_track/wav.scp
120
+ - speech_mix
121
+ - sound
122
+ - - dump/raw/tr05_simu_isolated_6ch_track/spk1.scp
123
+ - speech_ref1
124
+ - sound
125
+ - - dump/raw/tr05_simu_isolated_6ch_track/noise1.scp
126
+ - noise_ref1
127
+ - sound
128
+ valid_data_path_and_name_and_type:
129
+ - - dump/raw/dt05_simu_isolated_6ch_track/wav.scp
130
+ - speech_mix
131
+ - sound
132
+ - - dump/raw/dt05_simu_isolated_6ch_track/spk1.scp
133
+ - speech_ref1
134
+ - sound
135
+ - - dump/raw/dt05_simu_isolated_6ch_track/noise1.scp
136
+ - noise_ref1
137
+ - sound
138
+ allow_variable_data_keys: false
139
+ max_cache_size: 0.0
140
+ max_cache_fd: 32
141
+ valid_max_cache_size: null
142
+ optim: adam
143
+ optim_conf:
144
+ lr: 0.001
145
+ eps: 1.0e-08
146
+ weight_decay: 0
147
+ scheduler: reducelronplateau
148
+ scheduler_conf:
149
+ mode: min
150
+ factor: 0.5
151
+ patience: 1
152
+ init: xavier_uniform
153
+ model_conf:
154
+ loss_type: mask_mse
155
+ mask_type: PSM^2
156
+ use_preprocessor: false
157
+ encoder: stft
158
+ encoder_conf:
159
+ n_fft: 512
160
+ hop_length: 128
161
+ separator: wpe_beamformer
162
+ separator_conf:
163
+ num_spk: 1
164
+ loss_type: mask_mse
165
+ use_wpe: false
166
+ wnet_type: blstmp
167
+ wlayers: 3
168
+ wunits: 300
169
+ wprojs: 320
170
+ wdropout_rate: 0.0
171
+ taps: 5
172
+ delay: 3
173
+ use_dnn_mask_for_wpe: true
174
+ use_beamformer: true
175
+ bnet_type: blstmp
176
+ blayers: 3
177
+ bunits: 512
178
+ bprojs: 512
179
+ badim: 320
180
+ ref_channel: 3
181
+ use_noise_mask: true
182
+ beamformer_type: mvdr_souden
183
+ bdropout_rate: 0.0
184
+ decoder: stft
185
+ decoder_conf:
186
+ n_fft: 512
187
+ hop_length: 128
188
+ required:
189
+ - output_dir
190
+ version: 0.9.7
191
+ distributed: true
192
+ ```
193
+
194
+ </details>
195
+
196
+
197
+
198
+ ### Citing ESPnet
199
+
200
+ ```BibTex
201
+ @inproceedings{watanabe2018espnet,
202
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
203
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
204
+ year={2018},
205
+ booktitle={Proceedings of Interspeech},
206
+ pages={2207--2211},
207
+ doi={10.21437/Interspeech.2018-1456},
208
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
209
+ }
210
+
211
+ @inproceedings{li2021espnetse,
212
+ title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
213
+ author={Li, Chenda and Shi, Jing and Zhang, Wangyou and Subramanian, Aswin Shanmugam and Chang, Xuankai and Kamo, Naoyuki and Hira, Moto and Hayashi, Tomoki and Boeddeker, Christoph and Chen, Zhuo and Watanabe, Shinji},
214
+ booktitle={Proc. IEEE Spoken Language Technology Workshop (SLT)},
215
+ pages={785--792},
216
+ year={2021},
217
+ }
218
+
219
+ ```
220
+
221
+ or arXiv:
222
+
223
+ ```bibtex
224
+ @misc{watanabe2018espnet,
225
+ title={ESPnet: End-to-End Speech Processing Toolkit},
226
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
227
+ year={2018},
228
+ eprint={1804.00015},
229
+ archivePrefix={arXiv},
230
+ primaryClass={cs.CL}
231
+ }
232
+
233
+ @inproceedings{li2021espnetse,
234
+ title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
235
+ author={Li, Chenda and Shi, Jing and Zhang, Wangyou and Subramanian, Aswin Shanmugam and Chang, Xuankai and Kamo, Naoyuki and Hira, Moto and Hayashi, Tomoki and Boeddeker, Christoph and Chen, Zhuo and Watanabe, Shinji},
236
+ year={2020},
237
+ eprint={2011.03706},
238
+ archivePrefix={arXiv},
239
+ primaryClass={eess.AS}
240
+ }
241
+ ```
exp/enh_stats_16k/train/feats_stats.npz ADDED
Binary file (742 Bytes). View file
 
exp/enh_train_enh_beamformer_mvdr_raw/11epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:418c3c7ab136dea6ae4cfbd9084493dc8e18e95b68b60af9368d194955063eff
3
+ size 53613220
exp/enh_train_enh_beamformer_mvdr_raw/RESULTS.TXT ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Thu Jan 28 22:28:34 CST 2021`
5
+ - python version: `3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42) [GCC 7.2.0]`
6
+ - espnet version: `espnet 0.9.7`
7
+ - pytorch version: `pytorch 1.6.0`
8
+ - Git hash: `eddccfa7a296fcf8fd744200d7e8113f9a3c7e69`
9
+ - Commit date: `Thu Jan 21 21:42:24 2021 +0800`
10
+
11
+
12
+ ## enh_train_enh_beamformer_mvdr_raw
13
+
14
+ config: conf/tuning/train_enh_beamformer_mvdr.yaml
15
+
16
+ |dataset|PESQ|STOI|SAR|SDR|SIR|SI_SNR|
17
+ |---|---|---|---|---|---|---|
18
+ |enhanced_dt05_simu_isolated_6ch_track|2.60262|0.945147|13.6748|13.6748|0|12.5195|
19
+ |enhanced_et05_simu_isolated_6ch_track|2.63531|0.950153|15.5108|15.5108|0|14.6525|
20
+
exp/enh_train_enh_beamformer_mvdr_raw/config.yaml ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_beamformer_mvdr.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/enh_train_enh_beamformer_mvdr_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 35841
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ cudnn_enabled: true
21
+ cudnn_benchmark: false
22
+ cudnn_deterministic: true
23
+ collect_stats: false
24
+ write_collected_feats: false
25
+ max_epoch: 70
26
+ patience: 4
27
+ val_scheduler_criterion:
28
+ - valid
29
+ - loss
30
+ early_stopping_criterion:
31
+ - valid
32
+ - loss
33
+ - min
34
+ best_model_criterion:
35
+ - - valid
36
+ - si_snr
37
+ - max
38
+ - - valid
39
+ - loss
40
+ - min
41
+ keep_nbest_models: 1
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ train_dtype: float32
49
+ use_amp: false
50
+ log_interval: null
51
+ unused_parameters: false
52
+ use_tensorboard: true
53
+ use_wandb: false
54
+ wandb_project: null
55
+ wandb_id: null
56
+ pretrain_path: null
57
+ init_param: []
58
+ freeze_param: []
59
+ num_iters_per_epoch: null
60
+ batch_size: 8
61
+ valid_batch_size: null
62
+ batch_bins: 1000000
63
+ valid_batch_bins: null
64
+ train_shape_file:
65
+ - exp/enh_stats_16k/train/speech_mix_shape
66
+ - exp/enh_stats_16k/train/speech_ref1_shape
67
+ - exp/enh_stats_16k/train/noise_ref1_shape
68
+ valid_shape_file:
69
+ - exp/enh_stats_16k/valid/speech_mix_shape
70
+ - exp/enh_stats_16k/valid/speech_ref1_shape
71
+ - exp/enh_stats_16k/valid/noise_ref1_shape
72
+ batch_type: folded
73
+ valid_batch_type: null
74
+ fold_length:
75
+ - 80000
76
+ - 80000
77
+ - 80000
78
+ sort_in_batch: descending
79
+ sort_batch: descending
80
+ multiple_iterator: false
81
+ chunk_length: 500
82
+ chunk_shift_ratio: 0.5
83
+ num_cache_chunks: 1024
84
+ train_data_path_and_name_and_type:
85
+ - - dump/raw/tr05_simu_isolated_6ch_track/wav.scp
86
+ - speech_mix
87
+ - sound
88
+ - - dump/raw/tr05_simu_isolated_6ch_track/spk1.scp
89
+ - speech_ref1
90
+ - sound
91
+ - - dump/raw/tr05_simu_isolated_6ch_track/noise1.scp
92
+ - noise_ref1
93
+ - sound
94
+ valid_data_path_and_name_and_type:
95
+ - - dump/raw/dt05_simu_isolated_6ch_track/wav.scp
96
+ - speech_mix
97
+ - sound
98
+ - - dump/raw/dt05_simu_isolated_6ch_track/spk1.scp
99
+ - speech_ref1
100
+ - sound
101
+ - - dump/raw/dt05_simu_isolated_6ch_track/noise1.scp
102
+ - noise_ref1
103
+ - sound
104
+ allow_variable_data_keys: false
105
+ max_cache_size: 0.0
106
+ max_cache_fd: 32
107
+ valid_max_cache_size: null
108
+ optim: adam
109
+ optim_conf:
110
+ lr: 0.001
111
+ eps: 1.0e-08
112
+ weight_decay: 0
113
+ scheduler: reducelronplateau
114
+ scheduler_conf:
115
+ mode: min
116
+ factor: 0.5
117
+ patience: 1
118
+ init: xavier_uniform
119
+ model_conf:
120
+ loss_type: mask_mse
121
+ mask_type: PSM^2
122
+ use_preprocessor: false
123
+ encoder: stft
124
+ encoder_conf:
125
+ n_fft: 512
126
+ hop_length: 128
127
+ separator: wpe_beamformer
128
+ separator_conf:
129
+ num_spk: 1
130
+ loss_type: mask_mse
131
+ use_wpe: false
132
+ wnet_type: blstmp
133
+ wlayers: 3
134
+ wunits: 300
135
+ wprojs: 320
136
+ wdropout_rate: 0.0
137
+ taps: 5
138
+ delay: 3
139
+ use_dnn_mask_for_wpe: true
140
+ use_beamformer: true
141
+ bnet_type: blstmp
142
+ blayers: 3
143
+ bunits: 512
144
+ bprojs: 512
145
+ badim: 320
146
+ ref_channel: 3
147
+ use_noise_mask: true
148
+ beamformer_type: mvdr_souden
149
+ bdropout_rate: 0.0
150
+ decoder: stft
151
+ decoder_conf:
152
+ n_fft: 512
153
+ hop_length: 128
154
+ required:
155
+ - output_dir
156
+ version: 0.9.7
157
+ distributed: true
exp/enh_train_enh_beamformer_mvdr_raw/images/backward_time.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/forward_time.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/iter_time.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/loss.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/lr_0.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/optim_step_time.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/si_snr.png ADDED
exp/enh_train_enh_beamformer_mvdr_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.9.7
2
+ files:
3
+ model_file: exp/enh_train_enh_beamformer_mvdr_raw/11epoch.pth
4
+ python: "3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42) \n[GCC 7.2.0]"
5
+ timestamp: 1644422365.226252
6
+ torch: 1.6.0
7
+ yaml_files:
8
+ train_config: exp/enh_train_enh_beamformer_mvdr_raw/config.yaml