github-actions[bot] commited on
Commit
45b4605
·
0 Parent(s):

Sync to HuggingFace Spaces

Browse files
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.wav filter=lfs diff=lfs merge=lfs -text
.github/workflows/sync.yml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Spaces
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ jobs:
9
+ sync:
10
+ name: Sync
11
+ runs-on: ubuntu-latest
12
+
13
+ steps:
14
+ - name: Checkout Repository
15
+ uses: actions/checkout@v4
16
+ with:
17
+ lfs: true
18
+
19
+ - name: Sync to Hugging Face Spaces
20
+ uses: JacobLinCool/huggingface-sync@v1
21
+ with:
22
+ github: ${{ secrets.GITHUB_TOKEN }}
23
+ user: jacoblincool # Hugging Face username or organization name
24
+ space: MP-SENet # Hugging Face space name
25
+ token: ${{ secrets.HF_TOKEN }} # Hugging Face token
26
+ configuration: headers.yaml
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ __pycache__
2
+ *.pyc
3
+
4
+ .DS_Store
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 JacobLinCool
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: MP-SENet
3
+ short_description: MP-SENet is a speech enhancement model.
4
+ emoji: 🔊
5
+ colorFrom: green
6
+ colorTo: green
7
+ sdk: gradio
8
+ sdk_version: 5.4.0
9
+ app_file: app.py
10
+ pinned: false
11
+ license: mit
12
+ fullWidth: true
13
+ ---
14
+
15
+ # MP-SENet Gradio App
16
+
17
+ A Gradio app for [MP-SENet](https://github.com/yxlu-0102/MP-SENet) with ZeroGPU support.
18
+
19
+ Most of the code and the model weights are from the original repository (MIT licensed), with some modifications to make it work with Gradio and ZeroGPU and handle longer audio files.
20
+
21
+ ## API Usage
22
+
23
+ You can also use the model through the Gradio API. Here's an example:
24
+
25
+ ```python
26
+ from gradio_client import Client, handle_file
27
+
28
+ client = Client("JacobLinCool/MP-SENet")
29
+
30
+ task_id, _ = client.predict(
31
+ input=handle_file("path/to/audio.wav"),
32
+ plot=False,
33
+ api_name="/preprocess",
34
+ )
35
+ output, _, _, _ = client.predict(task_id=task_id, api_name="/run")
36
+ print(output) # The path to the output file
37
+ ```
38
+
39
+ The default `/run` endpoint will try to acquire GPU for 60 seconds. It should be sufficient for audio files up to 20 minutes.
40
+ If you are working with audio files longer than 20 minutes, you can use the `/run2x` or `/run4x` endpoints, which will try to acquire GPU for 120 and 240 seconds respectively.
app.py ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
4
+
5
+ import time
6
+ import librosa
7
+ import spaces
8
+ from librosa.display import specshow
9
+ import numpy as np
10
+ from accelerate import Accelerator
11
+ import matplotlib.pyplot as plt
12
+ import gradio as gr
13
+ from typing import Tuple
14
+ from MPSENet import MPSENet
15
+
16
+ accelerator = Accelerator()
17
+ device = accelerator.device
18
+ print(f"Using device: {device}")
19
+
20
+ model = MPSENet.from_pretrained("JacobLinCool/MP-SENet-DNS").to(device)
21
+
22
+
23
+ def plot_spec(y: np.ndarray, title: str = "Spectrogram") -> plt.Figure:
24
+ y[np.isnan(y)] = 0
25
+ y[np.isinf(y)] = 0
26
+ stft = librosa.stft(
27
+ y, n_fft=model.h.n_fft, hop_length=model.h.hop_size, win_length=model.h.win_size
28
+ )
29
+ D = librosa.amplitude_to_db(np.abs(stft), ref=np.max)
30
+
31
+ fig = plt.figure(figsize=(10, 4))
32
+ specshow(
33
+ D,
34
+ sr=model.sampling_rate,
35
+ n_fft=model.h.n_fft,
36
+ hop_length=model.h.hop_size,
37
+ win_length=model.h.win_size,
38
+ y_axis="linear",
39
+ x_axis="time",
40
+ cmap="viridis",
41
+ )
42
+ plt.title(title)
43
+ plt.tight_layout()
44
+
45
+ return fig
46
+
47
+
48
+ def plot_input(input: str) -> plt.Figure:
49
+ wav, _ = librosa.load(input, sr=model.sampling_rate)
50
+ return plot_spec(wav, title="Original Spectrogram")
51
+
52
+
53
+ def plot_output(output: Tuple[int, np.ndarray]) -> plt.Figure:
54
+ wav = output[1].astype(np.float32) / 32768.0
55
+ return plot_spec(wav, title="Processed Spectrogram")
56
+
57
+
58
+ def process_audio(
59
+ input: str,
60
+ segment_size_seconds: int,
61
+ ) -> Tuple[Tuple[int, np.ndarray], np.ndarray, np.ndarray, str]:
62
+ # Load the audio
63
+ start_time = time.time()
64
+ noisy_wav, sr = librosa.load(input, sr=model.sampling_rate)
65
+ print(f"{noisy_wav.shape=}, {sr=}")
66
+ print(f"Loaded audio in {time.time() - start_time:.2f} seconds")
67
+
68
+ # Process the audio
69
+ start_time = time.time()
70
+ processed_wav, sr, notation = model(
71
+ noisy_wav, segment_size=segment_size_seconds * 16000
72
+ )
73
+ print(f"{processed_wav.shape=}, {sr=}, {notation=}")
74
+ print(f"Inference in {time.time() - start_time:.2f} seconds")
75
+
76
+ return ((sr, processed_wav), "Processed.")
77
+
78
+
79
+ @spaces.GPU()
80
+ def run(input: str, segment_size_seconds: int):
81
+ return process_audio(input, segment_size_seconds)
82
+
83
+
84
+ @spaces.GPU(duration=60 * 2)
85
+ def run2x(input: str, segment_size_seconds: int):
86
+ return process_audio(input, segment_size_seconds)
87
+
88
+
89
+ @spaces.GPU(duration=60 * 4)
90
+ def run4x(input: str, segment_size_seconds: int):
91
+ return process_audio(input, segment_size_seconds)
92
+
93
+
94
+ with gr.Blocks() as app:
95
+ gr.Markdown(
96
+ "# MP-SENet Speech Enhancement\n\n[MP-SENet](https://github.com/yxlu-0102/MP-SENet) with ZeroGPU support.\n"
97
+ "> Package is available at [JacobLinCool/MPSENet](https://github.com/JacobLinCool/MPSENet)"
98
+ )
99
+
100
+ with gr.Row():
101
+ with gr.Column():
102
+ input = gr.Audio(
103
+ label="Upload an audio file", type="filepath", show_download_button=True
104
+ )
105
+ with gr.Column():
106
+ original_spec = gr.Plot(label="Original Spectrogram")
107
+
108
+ with gr.Row():
109
+ btn = gr.Button(value="Process", variant="primary")
110
+ with gr.Row():
111
+ info = gr.Markdown("Press the button to process the audio.")
112
+
113
+ with gr.Row():
114
+ with gr.Column():
115
+ output = gr.Audio(
116
+ label="Processed Audio", show_download_button=True
117
+ )
118
+ with gr.Column():
119
+ processed_spec = gr.Plot(label="Processed Spectrogram")
120
+
121
+ with gr.Accordion("Advanced Settings", open=False):
122
+ segment_size = gr.Slider(
123
+ minimum=1,
124
+ maximum=20,
125
+ value=10,
126
+ step=1,
127
+ label="Segment Size (seconds)",
128
+ info="The audio will be processed in segments of this size. Larger segments take more memory but may give more consistent results.",
129
+ )
130
+
131
+ input.change(
132
+ fn=plot_input,
133
+ inputs=[input],
134
+ outputs=[original_spec],
135
+ )
136
+ output.change(
137
+ fn=plot_output,
138
+ inputs=[output],
139
+ outputs=[processed_spec],
140
+ )
141
+
142
+ btn.click(
143
+ fn=run,
144
+ inputs=[input, segment_size],
145
+ outputs=[output, info],
146
+ api_name="run",
147
+ )
148
+
149
+ gr.Examples(
150
+ examples=[
151
+ ["examples/p226_007.wav", 2],
152
+ ["examples/p226_016.wav", 2],
153
+ ["examples/p230_005.wav", 8],
154
+ ["examples/p232_032.wav", 2],
155
+ ["examples/p232_232.wav", 2],
156
+ ],
157
+ inputs=[input, segment_size],
158
+ )
159
+
160
+ btn2x = gr.Button(value="Process", variant="primary", visible=False)
161
+ btn2x.click(
162
+ fn=run2x,
163
+ inputs=[input, segment_size],
164
+ outputs=[output, info],
165
+ api_name="run2x",
166
+ )
167
+
168
+ btn4x = gr.Button(value="Process", variant="primary", visible=False)
169
+ btn4x.click(
170
+ fn=run4x,
171
+ inputs=[input, segment_size],
172
+ outputs=[output, info],
173
+ api_name="run4x",
174
+ )
175
+
176
+ app.launch()
examples/p226_007.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6376b335503e50d05ef0f1958e7b70e405cc3e8cf532d1aa0522b7c034e67220
3
+ size 155884
examples/p226_016.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebb4c4fc1080393f2eaf7810b1f72d98c6f0d08ab644955673f26347795cd342
3
+ size 248044
examples/p230_005.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0137aaa3ef743832b042948dde655b86509276aea5ee9f4f3fb9f84710a27107
3
+ size 236658
examples/p232_032.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d044aece24c6a71f184ab9029c7cbf3c930923425d7ff426387b24f789766fe0
3
+ size 111726
examples/p232_232.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d4ddde57a48639395955eb8eac3fe2c7d7687384e3753a484e0247c0693535e
3
+ size 137004
headers.yaml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ title: MP-SENet
2
+ short_description: MP-SENet is a speech enhancement model.
3
+ emoji: 🔊
4
+ colorFrom: green
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.4.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ fullWidth: true
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ torch==2.2.0
2
+ soundfile==0.12.1
3
+ numpy==1.26.0
4
+ librosa==0.9.2
5
+ einops==0.8.0
6
+ gradio==5.4.0
7
+ accelerate==0.31.0
8
+ matplotlib==3.8.3
9
+ spaces
10
+ MPSENet