teticio commited on
Commit
aedf71e
1 Parent(s): b7f49a5

consolidate notebooks

Browse files
README.md CHANGED
@@ -9,14 +9,13 @@ app_file: app.py
9
  pinned: false
10
  license: gpl-3.0
11
  ---
12
-
13
- # audio-diffusion
14
 
15
  ### Apply [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) using the new Hugging Face [diffusers](https://github.com/huggingface/diffusers) package to synthesize music instead of images.
16
 
17
  ---
18
 
19
- **UPDATE**: I've trained a new [model](https://huggingface.co/teticio/audio-diffusion-breaks-256) on 30,000 samples that have been used in music, sourced from [WhoSampled](https://whosampled.com) and [YouTube](https://youtube.com). The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records. See [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) for details.
20
 
21
  ---
22
 
@@ -26,10 +25,9 @@ license: gpl-3.0
26
 
27
  Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the [`test_mel.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_mel.ipynb) notebook.
28
 
29
- A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the [`test_model.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model.ipynb) and [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) notebooks for examples.
30
-
31
- You can play around with the model I trained on about 500 songs from my Spotify "liked" playlist on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some automatically generated loops [here](https://soundcloud.com/teticio2/sets/audio-diffusion-loops).
32
 
 
33
 
34
  ---
35
 
 
9
  pinned: false
10
  license: gpl-3.0
11
  ---
12
+ # audio-diffusion [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb)
 
13
 
14
  ### Apply [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) using the new Hugging Face [diffusers](https://github.com/huggingface/diffusers) package to synthesize music instead of images.
15
 
16
  ---
17
 
18
+ **UPDATE**: I've trained a new [model](https://huggingface.co/teticio/audio-diffusion-breaks-256) on 30,000 samples that have been used in music, sourced from [WhoSampled](https://whosampled.com) and [YouTube](https://youtube.com). The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records.
19
 
20
  ---
21
 
 
25
 
26
  Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the [`test_mel.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_mel.ipynb) notebook.
27
 
28
+ A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio.
 
 
29
 
30
+ You can play around with the model on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some automatically generated loops [here](https://soundcloud.com/teticio2/sets/audio-diffusion-loops).
31
 
32
  ---
33
 
notebooks/test_model.ipynb CHANGED
@@ -2,7 +2,7 @@
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
- "id": "0fd939b0",
6
  "metadata": {},
7
  "source": [
8
  "<a href=\"https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
@@ -51,6 +51,26 @@
51
  "from audiodiffusion import AudioDiffusion"
52
  ]
53
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  {
55
  "cell_type": "markdown",
56
  "id": "011fb5a1",
@@ -61,12 +81,12 @@
61
  },
62
  {
63
  "cell_type": "code",
64
- "execution_count": 5,
65
  "id": "a3d45c36",
66
  "metadata": {},
67
  "outputs": [],
68
  "source": [
69
- "audio_diffusion = AudioDiffusion(model_id=\"teticio/audio-diffusion-256\")"
70
  ]
71
  },
72
  {
@@ -112,7 +132,7 @@
112
  "metadata": {},
113
  "outputs": [],
114
  "source": [
115
- "ds = load_dataset('teticio/audio-diffusion-256')"
116
  ]
117
  },
118
  {
@@ -168,53 +188,6 @@
168
  "Audio(data=audio, rate=mel.get_sample_rate())"
169
  ]
170
  },
171
- {
172
- "cell_type": "markdown",
173
- "id": "946fdb4d",
174
- "metadata": {},
175
- "source": [
176
- "### Push model to hub"
177
- ]
178
- },
179
- {
180
- "cell_type": "code",
181
- "execution_count": null,
182
- "id": "37c0564e",
183
- "metadata": {},
184
- "outputs": [],
185
- "source": [
186
- "from diffusers.hub_utils import init_git_repo, push_to_hub\n",
187
- "\n",
188
- "\n",
189
- "class AttributeDict(dict):\n",
190
- "\n",
191
- " def __getattr__(self, attr):\n",
192
- " return self[attr]\n",
193
- "\n",
194
- " def __setattr__(self, attr, value):\n",
195
- " self[attr] = value\n",
196
- "\n",
197
- "\n",
198
- "args = AttributeDict({\n",
199
- " \"hub_model_id\":\n",
200
- " \"teticio/audio-diffusion-256\",\n",
201
- " \"output_dir\":\n",
202
- " \"../ddpm-ema-audio-256-repo\",\n",
203
- " \"local_rank\":\n",
204
- " -1,\n",
205
- " \"hub_token\":\n",
206
- " open(os.path.join(os.environ['HOME'], '.huggingface/token'), 'rt').read(),\n",
207
- " \"hub_private_repo\":\n",
208
- " False,\n",
209
- " \"overwrite_output_dir\":\n",
210
- " False\n",
211
- "})\n",
212
- "\n",
213
- "repo = init_git_repo(args, at_init=True)\n",
214
- "ddpm = DDPMPipeline.from_pretrained('../ddpm-ema-audio-256')\n",
215
- "push_to_hub(args, ddpm, repo)"
216
- ]
217
- },
218
  {
219
  "cell_type": "code",
220
  "execution_count": null,
 
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
+ "id": "0a627a6f",
6
  "metadata": {},
7
  "source": [
8
  "<a href=\"https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 
51
  "from audiodiffusion import AudioDiffusion"
52
  ]
53
  },
54
+ {
55
+ "cell_type": "markdown",
56
+ "id": "7fd945bb",
57
+ "metadata": {},
58
+ "source": [
59
+ "### Select model"
60
+ ]
61
+ },
62
+ {
63
+ "cell_type": "code",
64
+ "execution_count": null,
65
+ "id": "97f24046",
66
+ "metadata": {},
67
+ "outputs": [],
68
+ "source": [
69
+ "#@markdown teticio/audio-diffusion-256 - trained on my Spotify \"liked\" playlist\n",
70
+ "#@markdown teticio/audio-diffusion-256-breaks - trained on samples used in music\n",
71
+ "model_id = \"teticio/audio-diffusion-256\" #@param [\"teticio/audio-diffusion-256\", \"teticio/audio-diffusion-256-breaks\"]"
72
+ ]
73
+ },
74
  {
75
  "cell_type": "markdown",
76
  "id": "011fb5a1",
 
81
  },
82
  {
83
  "cell_type": "code",
84
+ "execution_count": 4,
85
  "id": "a3d45c36",
86
  "metadata": {},
87
  "outputs": [],
88
  "source": [
89
+ "audio_diffusion = AudioDiffusion(model_id=model_id)"
90
  ]
91
  },
92
  {
 
132
  "metadata": {},
133
  "outputs": [],
134
  "source": [
135
+ "ds = load_dataset(model_id)"
136
  ]
137
  },
138
  {
 
188
  "Audio(data=audio, rate=mel.get_sample_rate())"
189
  ]
190
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  {
192
  "cell_type": "code",
193
  "execution_count": null,
notebooks/test_model_breaks.ipynb DELETED
The diff for this file is too large to render. See raw diff