File size: 2,999 Bytes
f0d6f7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2cb7ba3
f0d6f7d
 
 
 
 
d18ac80
2cb7ba3
f0d6f7d
 
 
 
 
 
 
 
 
 
 
 
 
2cb7ba3
 
 
 
 
 
f0d6f7d
d44ecaf
f0d6f7d
50c7cde
f0d6f7d
 
 
 
 
 
 
 
1009311
 
f0d6f7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2cb7ba3
 
 
f0d6f7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
language:
  - "en"
thumbnail:
tags:
- audio-to-audio 
- Speech Enhancement
- Voicebank-DEMAND
- UNIVERSE
- UNIVERSE++
- Diffusion
- pytorch
- open-universe
license: "apache-2.0"
datasets:
- Voicebank-DEMAND
metrics:
- SI-SNR
- PESQ
- SIG
- BAK
- OVRL
model-index:
- name: universe++
  results:
  - task:
      name: Speech Enhancement
      type: speech-enhancement
    dataset:
      name: DEMAND
      type: demand
      split: test-set
      args:
        language: en
    metrics:
    - name: DNSMOS SIG
      type: sig
      value: '3.493'
    - name: DNSMOS BAK
      type: bak
      value: '4.042'
    - name: DNSMOS OVRL
      type: ovrl
      value: '3.205'
    - name: PESQ
      type: pesq
      value: 3.017
    - name: SI-SDR
      type: si-sdr
      value: 18.629
---
# open-universe: Generative Speech Enhancement with Score-based Diffusion and Adversarial Training

This repository contains the configurations and weights for the [UNIVERSE++](https://arxiv.org/abs/2406.12194) and
[UNIVERSE](https://arxiv.org/abs/2206.03065) models implemented in [open-universe](https://github.com/line/open-universe).

The models were trained on the [Voicebank-DEMAND](https://datashare.ed.ac.uk/handle/10283/2791) dataset at 16 kHz.

The performance on the test split of Voicebank-DEMAND is given in the following table.

| model      |   si-sdr |   pesq-wb |   stoi-ext |   lsd |   lps |   OVRL |   SIG |   BAK |
|------------|----------|-----------|------------|-------|-------|--------|-------|-------|
| UNIVERSE++ |   18.624 |     3.017 |      0.864 | 4.867 | 0.937 |  3.200 | 3.489 | 4.040 |
| UNIVERSE   |   17.600 |     2.830 |      0.844 | 6.318 | 0.920 |  3.157 | 3.457 | 4.013 |

## Usage

Start by installing `open-universe`.
We use conda to simplify the installation.
```sh
git clone https://github.com/line/open-universe.git
cd open-universe
conda env create -f environment.yaml
conda activate open-universe
python -m pip install .
```

Then the models can be used as follows.
```sh
# UNIVERSE++ (default model)
python -m open_universe.bin.enhance <input/folder> <output/folder> \
  --model line-corporation/open-universe:plusplus

# UNIVERSE
python -m open_universe.bin.enhance <input/folder> <output/folder> \
  --model line-corporation/open-universe:original
```

## Referencing open-universe and UNIVERSE++

If you use these models in your work, please consider citing the following paper.

```latex
@inproceedings{universepp,
    authors={Scheibler, Robin and Fujita, Yusuke and Shirahata, Yuma and Komatsu, Tatsuya},
    title={Universal Score-based Speech Enhancement with High Content Preservation},
    booktitle={Proc. Interspeech 2024},
    month=sep,
    year=2024
}
```

## Referencing UNIVERSE

```latex
@misc{universe,
    authors={Serr\'a, Joan and Santiago, Pascual and Pons, Jordi and Araz, Oguz R. and Scaini, David},
    title={Universal Speech Enhancement with Score-based Diffusion},
    howpublished={arXiv:2206.03065},
    month=sep,
    year=2022
}
```