teticio commited on
Commit
72c877e
1 Parent(s): b929114

add gradio app

Browse files
Files changed (4) hide show
  1. README.md +1 -1
  2. app.py +38 -0
  3. notebooks/test-model.ipynb +0 -0
  4. requirements.txt +5 -8
README.md CHANGED
@@ -6,7 +6,7 @@
6
 
7
  ![mel spectrogram](mel.png)
8
 
9
- Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice-versa. The higher the resolution, the less audio information will be lost. You can see how this works in the `test-mel.ipynb` notebook.
10
 
11
  A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
12
 
 
6
 
7
  ![mel spectrogram](mel.png)
8
 
9
+ Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the `test-mel.ipynb` notebook.
10
 
11
  A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
12
 
app.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+
3
+ import gradio as gr
4
+ from PIL import Image
5
+ from diffusers import DDPMPipeline
6
+
7
+ from src.mel import Mel
8
+
9
+ mel = Mel(x_res=256, y_res=256)
10
+ model_id = "teticio/audio-diffusion-256"
11
+ ddpm = DDPMPipeline.from_pretrained(model_id)
12
+
13
+
14
+ def generate_spectrogram_and_audio():
15
+ images = ddpm(output_type="numpy")["sample"]
16
+ images = (images * 255).round().astype("uint8").transpose(0, 3, 1, 2)
17
+ image = Image.fromarray(images[0][0])
18
+ audio = mel.image_to_audio(image)
19
+ return image, (mel.get_sample_rate(), audio)
20
+
21
+
22
+ if __name__ == "__main__":
23
+ parser = argparse.ArgumentParser()
24
+ parser.add_argument("--port", type=int)
25
+ parser.add_argument("--server", type=int)
26
+ args = parser.parse_args()
27
+
28
+ demo = gr.Interface(
29
+ fn=generate_spectrogram_and_audio,
30
+ title="Audio Diffusion",
31
+ description=f"Generate audio using Huggingface diffusers",
32
+ inputs=[],
33
+ outputs=[
34
+ gr.Image(label="Mel spectrogram", image_mode="L"),
35
+ gr.Audio(label="Audio"),
36
+ ],
37
+ )
38
+ demo.launch(server_name=args.server or "0.0.0.0", server_port=args.port)
notebooks/test-model.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
requirements.txt CHANGED
@@ -1,8 +1,5 @@
1
- torch==1.12.1
2
- torchvision==0.13.1
3
- numpy==1.22.4
4
- Pillow==9.2.0
5
- accelerate==0.12.0
6
- datasets==2.4.0
7
- diffusers==0.1.3
8
- tqdm==4.64.0
 
1
+ # for Hugging Face spaces
2
+ torch
3
+ numpy
4
+ Pillow
5
+ diffusers