Spaces:
Running
on
Zero
Running
on
Zero
github-actions[bot]
commited on
Commit
·
45b4605
0
Parent(s):
Sync to HuggingFace Spaces
Browse files- .gitattributes +36 -0
- .github/workflows/sync.yml +26 -0
- .gitignore +4 -0
- LICENSE +21 -0
- README.md +40 -0
- app.py +176 -0
- examples/p226_007.wav +3 -0
- examples/p226_016.wav +3 -0
- examples/p230_005.wav +3 -0
- examples/p232_032.wav +3 -0
- examples/p232_232.wav +3 -0
- headers.yaml +11 -0
- requirements.txt +10 -0
.gitattributes
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.wav filter=lfs diff=lfs merge=lfs -text
|
.github/workflows/sync.yml
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
name: Sync to Hugging Face Spaces
|
2 |
+
|
3 |
+
on:
|
4 |
+
push:
|
5 |
+
branches:
|
6 |
+
- main
|
7 |
+
|
8 |
+
jobs:
|
9 |
+
sync:
|
10 |
+
name: Sync
|
11 |
+
runs-on: ubuntu-latest
|
12 |
+
|
13 |
+
steps:
|
14 |
+
- name: Checkout Repository
|
15 |
+
uses: actions/checkout@v4
|
16 |
+
with:
|
17 |
+
lfs: true
|
18 |
+
|
19 |
+
- name: Sync to Hugging Face Spaces
|
20 |
+
uses: JacobLinCool/huggingface-sync@v1
|
21 |
+
with:
|
22 |
+
github: ${{ secrets.GITHUB_TOKEN }}
|
23 |
+
user: jacoblincool # Hugging Face username or organization name
|
24 |
+
space: MP-SENet # Hugging Face space name
|
25 |
+
token: ${{ secrets.HF_TOKEN }} # Hugging Face token
|
26 |
+
configuration: headers.yaml
|
.gitignore
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__pycache__
|
2 |
+
*.pyc
|
3 |
+
|
4 |
+
.DS_Store
|
LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MIT License
|
2 |
+
|
3 |
+
Copyright (c) 2024 JacobLinCool
|
4 |
+
|
5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
+
of this software and associated documentation files (the "Software"), to deal
|
7 |
+
in the Software without restriction, including without limitation the rights
|
8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
+
copies of the Software, and to permit persons to whom the Software is
|
10 |
+
furnished to do so, subject to the following conditions:
|
11 |
+
|
12 |
+
The above copyright notice and this permission notice shall be included in all
|
13 |
+
copies or substantial portions of the Software.
|
14 |
+
|
15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
+
SOFTWARE.
|
README.md
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: MP-SENet
|
3 |
+
short_description: MP-SENet is a speech enhancement model.
|
4 |
+
emoji: 🔊
|
5 |
+
colorFrom: green
|
6 |
+
colorTo: green
|
7 |
+
sdk: gradio
|
8 |
+
sdk_version: 5.4.0
|
9 |
+
app_file: app.py
|
10 |
+
pinned: false
|
11 |
+
license: mit
|
12 |
+
fullWidth: true
|
13 |
+
---
|
14 |
+
|
15 |
+
# MP-SENet Gradio App
|
16 |
+
|
17 |
+
A Gradio app for [MP-SENet](https://github.com/yxlu-0102/MP-SENet) with ZeroGPU support.
|
18 |
+
|
19 |
+
Most of the code and the model weights are from the original repository (MIT licensed), with some modifications to make it work with Gradio and ZeroGPU and handle longer audio files.
|
20 |
+
|
21 |
+
## API Usage
|
22 |
+
|
23 |
+
You can also use the model through the Gradio API. Here's an example:
|
24 |
+
|
25 |
+
```python
|
26 |
+
from gradio_client import Client, handle_file
|
27 |
+
|
28 |
+
client = Client("JacobLinCool/MP-SENet")
|
29 |
+
|
30 |
+
task_id, _ = client.predict(
|
31 |
+
input=handle_file("path/to/audio.wav"),
|
32 |
+
plot=False,
|
33 |
+
api_name="/preprocess",
|
34 |
+
)
|
35 |
+
output, _, _, _ = client.predict(task_id=task_id, api_name="/run")
|
36 |
+
print(output) # The path to the output file
|
37 |
+
```
|
38 |
+
|
39 |
+
The default `/run` endpoint will try to acquire GPU for 60 seconds. It should be sufficient for audio files up to 20 minutes.
|
40 |
+
If you are working with audio files longer than 20 minutes, you can use the `/run2x` or `/run4x` endpoints, which will try to acquire GPU for 120 and 240 seconds respectively.
|
app.py
ADDED
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
|
3 |
+
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
|
4 |
+
|
5 |
+
import time
|
6 |
+
import librosa
|
7 |
+
import spaces
|
8 |
+
from librosa.display import specshow
|
9 |
+
import numpy as np
|
10 |
+
from accelerate import Accelerator
|
11 |
+
import matplotlib.pyplot as plt
|
12 |
+
import gradio as gr
|
13 |
+
from typing import Tuple
|
14 |
+
from MPSENet import MPSENet
|
15 |
+
|
16 |
+
accelerator = Accelerator()
|
17 |
+
device = accelerator.device
|
18 |
+
print(f"Using device: {device}")
|
19 |
+
|
20 |
+
model = MPSENet.from_pretrained("JacobLinCool/MP-SENet-DNS").to(device)
|
21 |
+
|
22 |
+
|
23 |
+
def plot_spec(y: np.ndarray, title: str = "Spectrogram") -> plt.Figure:
|
24 |
+
y[np.isnan(y)] = 0
|
25 |
+
y[np.isinf(y)] = 0
|
26 |
+
stft = librosa.stft(
|
27 |
+
y, n_fft=model.h.n_fft, hop_length=model.h.hop_size, win_length=model.h.win_size
|
28 |
+
)
|
29 |
+
D = librosa.amplitude_to_db(np.abs(stft), ref=np.max)
|
30 |
+
|
31 |
+
fig = plt.figure(figsize=(10, 4))
|
32 |
+
specshow(
|
33 |
+
D,
|
34 |
+
sr=model.sampling_rate,
|
35 |
+
n_fft=model.h.n_fft,
|
36 |
+
hop_length=model.h.hop_size,
|
37 |
+
win_length=model.h.win_size,
|
38 |
+
y_axis="linear",
|
39 |
+
x_axis="time",
|
40 |
+
cmap="viridis",
|
41 |
+
)
|
42 |
+
plt.title(title)
|
43 |
+
plt.tight_layout()
|
44 |
+
|
45 |
+
return fig
|
46 |
+
|
47 |
+
|
48 |
+
def plot_input(input: str) -> plt.Figure:
|
49 |
+
wav, _ = librosa.load(input, sr=model.sampling_rate)
|
50 |
+
return plot_spec(wav, title="Original Spectrogram")
|
51 |
+
|
52 |
+
|
53 |
+
def plot_output(output: Tuple[int, np.ndarray]) -> plt.Figure:
|
54 |
+
wav = output[1].astype(np.float32) / 32768.0
|
55 |
+
return plot_spec(wav, title="Processed Spectrogram")
|
56 |
+
|
57 |
+
|
58 |
+
def process_audio(
|
59 |
+
input: str,
|
60 |
+
segment_size_seconds: int,
|
61 |
+
) -> Tuple[Tuple[int, np.ndarray], np.ndarray, np.ndarray, str]:
|
62 |
+
# Load the audio
|
63 |
+
start_time = time.time()
|
64 |
+
noisy_wav, sr = librosa.load(input, sr=model.sampling_rate)
|
65 |
+
print(f"{noisy_wav.shape=}, {sr=}")
|
66 |
+
print(f"Loaded audio in {time.time() - start_time:.2f} seconds")
|
67 |
+
|
68 |
+
# Process the audio
|
69 |
+
start_time = time.time()
|
70 |
+
processed_wav, sr, notation = model(
|
71 |
+
noisy_wav, segment_size=segment_size_seconds * 16000
|
72 |
+
)
|
73 |
+
print(f"{processed_wav.shape=}, {sr=}, {notation=}")
|
74 |
+
print(f"Inference in {time.time() - start_time:.2f} seconds")
|
75 |
+
|
76 |
+
return ((sr, processed_wav), "Processed.")
|
77 |
+
|
78 |
+
|
79 |
+
@spaces.GPU()
|
80 |
+
def run(input: str, segment_size_seconds: int):
|
81 |
+
return process_audio(input, segment_size_seconds)
|
82 |
+
|
83 |
+
|
84 |
+
@spaces.GPU(duration=60 * 2)
|
85 |
+
def run2x(input: str, segment_size_seconds: int):
|
86 |
+
return process_audio(input, segment_size_seconds)
|
87 |
+
|
88 |
+
|
89 |
+
@spaces.GPU(duration=60 * 4)
|
90 |
+
def run4x(input: str, segment_size_seconds: int):
|
91 |
+
return process_audio(input, segment_size_seconds)
|
92 |
+
|
93 |
+
|
94 |
+
with gr.Blocks() as app:
|
95 |
+
gr.Markdown(
|
96 |
+
"# MP-SENet Speech Enhancement\n\n[MP-SENet](https://github.com/yxlu-0102/MP-SENet) with ZeroGPU support.\n"
|
97 |
+
"> Package is available at [JacobLinCool/MPSENet](https://github.com/JacobLinCool/MPSENet)"
|
98 |
+
)
|
99 |
+
|
100 |
+
with gr.Row():
|
101 |
+
with gr.Column():
|
102 |
+
input = gr.Audio(
|
103 |
+
label="Upload an audio file", type="filepath", show_download_button=True
|
104 |
+
)
|
105 |
+
with gr.Column():
|
106 |
+
original_spec = gr.Plot(label="Original Spectrogram")
|
107 |
+
|
108 |
+
with gr.Row():
|
109 |
+
btn = gr.Button(value="Process", variant="primary")
|
110 |
+
with gr.Row():
|
111 |
+
info = gr.Markdown("Press the button to process the audio.")
|
112 |
+
|
113 |
+
with gr.Row():
|
114 |
+
with gr.Column():
|
115 |
+
output = gr.Audio(
|
116 |
+
label="Processed Audio", show_download_button=True
|
117 |
+
)
|
118 |
+
with gr.Column():
|
119 |
+
processed_spec = gr.Plot(label="Processed Spectrogram")
|
120 |
+
|
121 |
+
with gr.Accordion("Advanced Settings", open=False):
|
122 |
+
segment_size = gr.Slider(
|
123 |
+
minimum=1,
|
124 |
+
maximum=20,
|
125 |
+
value=10,
|
126 |
+
step=1,
|
127 |
+
label="Segment Size (seconds)",
|
128 |
+
info="The audio will be processed in segments of this size. Larger segments take more memory but may give more consistent results.",
|
129 |
+
)
|
130 |
+
|
131 |
+
input.change(
|
132 |
+
fn=plot_input,
|
133 |
+
inputs=[input],
|
134 |
+
outputs=[original_spec],
|
135 |
+
)
|
136 |
+
output.change(
|
137 |
+
fn=plot_output,
|
138 |
+
inputs=[output],
|
139 |
+
outputs=[processed_spec],
|
140 |
+
)
|
141 |
+
|
142 |
+
btn.click(
|
143 |
+
fn=run,
|
144 |
+
inputs=[input, segment_size],
|
145 |
+
outputs=[output, info],
|
146 |
+
api_name="run",
|
147 |
+
)
|
148 |
+
|
149 |
+
gr.Examples(
|
150 |
+
examples=[
|
151 |
+
["examples/p226_007.wav", 2],
|
152 |
+
["examples/p226_016.wav", 2],
|
153 |
+
["examples/p230_005.wav", 8],
|
154 |
+
["examples/p232_032.wav", 2],
|
155 |
+
["examples/p232_232.wav", 2],
|
156 |
+
],
|
157 |
+
inputs=[input, segment_size],
|
158 |
+
)
|
159 |
+
|
160 |
+
btn2x = gr.Button(value="Process", variant="primary", visible=False)
|
161 |
+
btn2x.click(
|
162 |
+
fn=run2x,
|
163 |
+
inputs=[input, segment_size],
|
164 |
+
outputs=[output, info],
|
165 |
+
api_name="run2x",
|
166 |
+
)
|
167 |
+
|
168 |
+
btn4x = gr.Button(value="Process", variant="primary", visible=False)
|
169 |
+
btn4x.click(
|
170 |
+
fn=run4x,
|
171 |
+
inputs=[input, segment_size],
|
172 |
+
outputs=[output, info],
|
173 |
+
api_name="run4x",
|
174 |
+
)
|
175 |
+
|
176 |
+
app.launch()
|
examples/p226_007.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6376b335503e50d05ef0f1958e7b70e405cc3e8cf532d1aa0522b7c034e67220
|
3 |
+
size 155884
|
examples/p226_016.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ebb4c4fc1080393f2eaf7810b1f72d98c6f0d08ab644955673f26347795cd342
|
3 |
+
size 248044
|
examples/p230_005.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0137aaa3ef743832b042948dde655b86509276aea5ee9f4f3fb9f84710a27107
|
3 |
+
size 236658
|
examples/p232_032.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d044aece24c6a71f184ab9029c7cbf3c930923425d7ff426387b24f789766fe0
|
3 |
+
size 111726
|
examples/p232_232.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2d4ddde57a48639395955eb8eac3fe2c7d7687384e3753a484e0247c0693535e
|
3 |
+
size 137004
|
headers.yaml
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
title: MP-SENet
|
2 |
+
short_description: MP-SENet is a speech enhancement model.
|
3 |
+
emoji: 🔊
|
4 |
+
colorFrom: green
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.4.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
fullWidth: true
|
requirements.txt
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
torch==2.2.0
|
2 |
+
soundfile==0.12.1
|
3 |
+
numpy==1.26.0
|
4 |
+
librosa==0.9.2
|
5 |
+
einops==0.8.0
|
6 |
+
gradio==5.4.0
|
7 |
+
accelerate==0.31.0
|
8 |
+
matplotlib==3.8.3
|
9 |
+
spaces
|
10 |
+
MPSENet
|