wyz commited on
Commit
745b126
·
verified ·
1 Parent(s): b00d709

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -8
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - audio-to-audio
6
  language: en
7
  datasets:
8
- - universal_se
9
  license: cc-by-4.0
10
  ---
11
 
@@ -13,22 +13,69 @@ license: cc-by-4.0
13
 
14
  ### `wyz/vctk_bsrnn_xtiny_noncausal`
15
 
16
- This model was trained by Emrys365 using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
 
18
  ### Demo: How to use in ESPnet2
19
 
20
  Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
  if you haven't done that already.
22
 
23
- ```bash
24
- cd espnet
25
- git checkout 443028662106472c60fe8bd892cb277e5b488651
26
- pip install -e .
27
- cd egs2/universal_se/enh1
28
- ./run.sh --skip_data_prep false --skip_train true --download_model wyz/vctk_bsrnn_xtiny_noncausal
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ```
30
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## ENH config
34
 
 
5
  - audio-to-audio
6
  language: en
7
  datasets:
8
+ - VCTK_DEMAND
9
  license: cc-by-4.0
10
  ---
11
 
 
13
 
14
  ### `wyz/vctk_bsrnn_xtiny_noncausal`
15
 
16
+ This model was trained by wyz based on the universal_se_v1 recipe in [espnet](https://github.com/espnet/espnet/).
17
 
18
  ### Demo: How to use in ESPnet2
19
 
20
  Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
  if you haven't done that already.
22
 
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag="wyz/vctk_bsrnn_xtiny_noncausal",
32
+ normalize_output_wav=True,
33
+ device="cuda",
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config="exp_vctk/enh_train_enh_bsrnn_xtiny_noncausal_raw/config.yaml",
38
+ # model_file="exp_vctk/enh_train_enh_bsrnn_xtiny_noncausal_raw/xxxx.pth",
39
+ # normalize_output_wav=True,
40
+ # device="cuda",
41
+ # )
42
+
43
+ audio, fs = sf.read("/path/to/noisy/utt1.flac")
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
  ```
46
 
47
 
48
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
49
+ # RESULTS
50
+ ## Environments
51
+ - date: `Wed Feb 28 10:35:20 EST 2024`
52
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
53
+ - espnet version: `espnet 202304`
54
+ - pytorch version: `pytorch 2.0.1+cu118`
55
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
56
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
57
+
58
+
59
+ ## enhanced_test_16k
60
+
61
+
62
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
63
+ |---|---|---|---|---|---|---|---|---|---|---|
64
+ |chime4_et05_real_isolated_6ch_track|1.13|49.79|-3.56|-3.56|0.00|-31.36|2.56|2.92|3.58|3.26|
65
+ |chime4_et05_simu_isolated_6ch_track|1.24|75.50|5.95|5.95|0.00|0.62|2.50|2.83|3.67|2.96|
66
+ |dns20_tt_synthetic_no_reverb|2.24|94.56|14.53|14.53|0.00|14.29|3.10|3.52|3.73|3.75|
67
+ |reverb_et_real_8ch_multich|1.11|56.20|4.48|4.48|0.00|0.98|2.34|2.67|3.62|3.00|
68
+ |reverb_et_simu_8ch_multich|1.60|80.55|8.51|8.51|0.00|-11.30|2.88|3.24|3.76|3.56|
69
+ |whamr_tt_mix_single_reverb_max_16k|1.25|74.62|3.86|3.86|0.00|-0.29|2.45|2.78|3.66|3.19|
70
+
71
+
72
+ ## enhanced_test_48k
73
+
74
+
75
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
76
+ |---|---|---|---|---|---|---|---|---|---|
77
+ |vctk_noisy_tt_2spk|94.56|20.02|20.02|0.00|18.94|3.10|3.42|3.93|3.47|
78
+
79
 
80
  ## ENH config
81