Audio-to-Audio
speechbrain
English
speech-enhancement
PyTorch
pplantinga commited on
Commit
85b2091
·
1 Parent(s): 0c1da84

Add PESQ 3.15 model

Browse files
Files changed (3) hide show
  1. README.md +76 -0
  2. enhance_model.ckpt +0 -0
  3. hyperparams.yaml +40 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ tags:
4
+ - Speech Enhancement
5
+ - PyTorch
6
+ license: "apache-2.0"
7
+ datasets:
8
+ - Voicebank
9
+ - DEMAND
10
+ metrics:
11
+ - PESQ
12
+ - STOI
13
+ ---
14
+
15
+ # MetricGAN-trained model for Enhancement
16
+
17
+ This repository provides all the necessary tools to perform enhancement with
18
+ SpeechBrain. For a better experience we encourage you to learn more about
19
+ [SpeechBrain](https://speechbrain.github.io). The given model performance is:
20
+
21
+ | Release | Test PESQ | Test STOI |
22
+ |:-----------:|:-----:| :-----:|
23
+ | 21-04-27 | 3.15 | 93.0 |
24
+
25
+ ## Install SpeechBrain
26
+
27
+ First of all, please install SpeechBrain with the following command:
28
+
29
+ ```
30
+ pip install speechbrain
31
+ ```
32
+
33
+ Please notice that we encourage you to read our tutorials and learn more about
34
+ [SpeechBrain](https://speechbrain.github.io).
35
+
36
+ ## Pretrained Usage
37
+
38
+ To use the mimic-loss-trained model for enhancement, use the following simple code:
39
+
40
+ ```python
41
+ from speechbrain.pretrained import SpectralMaskEnhancement
42
+
43
+ enhance_model = SpectralMaskEnhancement.from_hparams(
44
+ source="speechbrain/metricgan-plus-voicebank",
45
+ savedir="pretrained_models/metricgan-plus-voicebank",
46
+ )
47
+ enhance_model.enhance_file("/path/to/file.wav")
48
+ ```
49
+
50
+ ## Referencing MetricGAN+
51
+
52
+ If you find MetricGAN+ useful, please cite:
53
+
54
+ ```
55
+ @article{fu2021metricgan+,
56
+ title={MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement},
57
+ author={Fu, Szu-Wei and Yu, Cheng and Hsieh, Tsun-An and Plantinga, Peter and Ravanelli, Mirco and Lu, Xugang and Tsao, Yu},
58
+ journal={arXiv preprint arXiv:2104.03538},
59
+ year={2021}
60
+ }
61
+ ```
62
+
63
+ ## Referencing SpeechBrain
64
+
65
+ If you find SpeechBrain useful, please cite:
66
+
67
+ ```
68
+ @misc{SB2021,
69
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
70
+ title = {SpeechBrain},
71
+ year = {2021},
72
+ publisher = {GitHub},
73
+ journal = {GitHub repository},
74
+ howpublished = {\url{https://github.com/speechbrain/speechbrain}},
75
+ }
76
+ ```
enhance_model.ckpt ADDED
Binary file (7.59 MB). View file
 
hyperparams.yaml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # STFT parameters
2
+ sample_rate: 16000
3
+ win_length: 32
4
+ hop_length: 16
5
+ n_fft: 512
6
+ window_fn: !name:torch.hamming_window
7
+
8
+ compute_stft: !new:speechbrain.processing.features.STFT
9
+ sample_rate: !ref <sample_rate>
10
+ n_fft: !ref <n_fft>
11
+ win_length: !ref <win_length>
12
+ hop_length: !ref <hop_length>
13
+ window_fn: !ref <window_fn>
14
+
15
+ compute_istft: !new:speechbrain.processing.features.ISTFT
16
+ sample_rate: !ref <sample_rate>
17
+ n_fft: !ref <n_fft>
18
+ win_length: !ref <win_length>
19
+ hop_length: !ref <hop_length>
20
+ window_fn: !ref <window_fn>
21
+
22
+ spectral_magnitude: !name:speechbrain.processing.features.spectral_magnitude
23
+ power: 0.5
24
+
25
+ resynth: !name:speechbrain.processing.signal_processing.resynthesize
26
+ stft: !ref <compute_stft>
27
+ istft: !ref <compute_istft>
28
+
29
+ enhance_model: !new:speechbrain.lobes.models.MetricGAN.EnhancementGenerator
30
+ input_size: !ref <n_fft> // 2 + 1
31
+ hidden_size: 200
32
+ num_layers: 2
33
+ dropout: 0
34
+
35
+ modules:
36
+ enhance_model: !ref <enhance_model>
37
+
38
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
39
+ loadables:
40
+ enhance_model: !ref <enhance_model>