Gpagejr12 commited on
Commit
6d11b7f
1 Parent(s): c8ad64e

Upload 2 files

Browse files
Files changed (2) hide show
  1. demos_audiogen_demo.ipynb +175 -0
  2. demos_musicgen_demo.ipynb +232 -0
demos_audiogen_demo.ipynb ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# AudioGen\n",
8
+ "Welcome to AudioGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use AudioGen in different settings.\n",
9
+ "\n",
10
+ "First, we start by initializing AudioGen. For now, we provide only a medium sized model for AudioGen: `facebook/audiogen-medium` - 1.5B transformer decoder. \n",
11
+ "\n",
12
+ "**Important note:** This variant is different from the original AudioGen model presented at [\"AudioGen: Textually-guided audio generation\"](https://arxiv.org/abs/2209.15352) as the model architecture is similar to MusicGen with a smaller frame rate and multiple streams of tokens, allowing to reduce generation time."
13
+ ]
14
+ },
15
+ {
16
+ "cell_type": "code",
17
+ "execution_count": null,
18
+ "metadata": {},
19
+ "outputs": [],
20
+ "source": [
21
+ "from audiocraft.models import AudioGen\n",
22
+ "\n",
23
+ "model = AudioGen.get_pretrained('facebook/audiogen-medium')"
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "markdown",
28
+ "metadata": {},
29
+ "source": [
30
+ "Next, let us configure the generation parameters. Specifically, you can control the following:\n",
31
+ "* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.\n",
32
+ "* `top_k` (int, optional): top_k used for sampling. Defaults to 250.\n",
33
+ "* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.\n",
34
+ "* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.\n",
35
+ "* `duration` (float, optional): duration of the generated waveform. Defaults to 10.0.\n",
36
+ "* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.\n",
37
+ "\n",
38
+ "When left unchanged, AudioGen will revert to its default parameters."
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "code",
43
+ "execution_count": null,
44
+ "metadata": {},
45
+ "outputs": [],
46
+ "source": [
47
+ "model.set_generation_params(\n",
48
+ " use_sampling=True,\n",
49
+ " top_k=250,\n",
50
+ " duration=5\n",
51
+ ")"
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "markdown",
56
+ "metadata": {},
57
+ "source": [
58
+ "Next, we can go ahead and start generating sound using one of the following modes:\n",
59
+ "* Audio continuation using `model.generate_continuation`\n",
60
+ "* Text-conditional samples using `model.generate`"
61
+ ]
62
+ },
63
+ {
64
+ "cell_type": "markdown",
65
+ "metadata": {},
66
+ "source": [
67
+ "### Audio Continuation"
68
+ ]
69
+ },
70
+ {
71
+ "cell_type": "code",
72
+ "execution_count": null,
73
+ "metadata": {},
74
+ "outputs": [],
75
+ "source": [
76
+ "import math\n",
77
+ "import torchaudio\n",
78
+ "import torch\n",
79
+ "from audiocraft.utils.notebook import display_audio\n",
80
+ "\n",
81
+ "def get_bip_bip(bip_duration=0.125, frequency=440,\n",
82
+ " duration=0.5, sample_rate=16000, device=\"cuda\"):\n",
83
+ " \"\"\"Generates a series of bip bip at the given frequency.\"\"\"\n",
84
+ " t = torch.arange(\n",
85
+ " int(duration * sample_rate), device=\"cuda\", dtype=torch.float) / sample_rate\n",
86
+ " wav = torch.cos(2 * math.pi * 440 * t)[None]\n",
87
+ " tp = (t % (2 * bip_duration)) / (2 * bip_duration)\n",
88
+ " envelope = (tp >= 0.5).float()\n",
89
+ " return wav * envelope"
90
+ ]
91
+ },
92
+ {
93
+ "cell_type": "code",
94
+ "execution_count": null,
95
+ "metadata": {},
96
+ "outputs": [],
97
+ "source": [
98
+ "# Here we use a synthetic signal to prompt the generated audio.\n",
99
+ "res = model.generate_continuation(\n",
100
+ " get_bip_bip(0.125).expand(2, -1, -1), \n",
101
+ " 16000, ['Whistling with wind blowing', \n",
102
+ " 'Typing on a typewriter'], \n",
103
+ " progress=True)\n",
104
+ "display_audio(res, 16000)"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": null,
110
+ "metadata": {},
111
+ "outputs": [],
112
+ "source": [
113
+ "# You can also use any audio from a file. Make sure to trim the file if it is too long!\n",
114
+ "prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\")\n",
115
+ "prompt_duration = 2\n",
116
+ "prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\n",
117
+ "output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True)\n",
118
+ "display_audio(output, sample_rate=16000)"
119
+ ]
120
+ },
121
+ {
122
+ "cell_type": "markdown",
123
+ "metadata": {},
124
+ "source": [
125
+ "### Text-conditional Generation"
126
+ ]
127
+ },
128
+ {
129
+ "cell_type": "code",
130
+ "execution_count": null,
131
+ "metadata": {},
132
+ "outputs": [],
133
+ "source": [
134
+ "from audiocraft.utils.notebook import display_audio\n",
135
+ "\n",
136
+ "output = model.generate(\n",
137
+ " descriptions=[\n",
138
+ " 'Subway train blowing its horn',\n",
139
+ " 'A cat meowing',\n",
140
+ " ],\n",
141
+ " progress=True\n",
142
+ ")\n",
143
+ "display_audio(output, sample_rate=16000)"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "code",
148
+ "execution_count": null,
149
+ "metadata": {},
150
+ "outputs": [],
151
+ "source": []
152
+ }
153
+ ],
154
+ "metadata": {
155
+ "kernelspec": {
156
+ "display_name": "Python 3 (ipykernel)",
157
+ "language": "python",
158
+ "name": "python3"
159
+ },
160
+ "language_info": {
161
+ "codemirror_mode": {
162
+ "name": "ipython",
163
+ "version": 3
164
+ },
165
+ "file_extension": ".py",
166
+ "mimetype": "text/x-python",
167
+ "name": "python",
168
+ "nbconvert_exporter": "python",
169
+ "pygments_lexer": "ipython3",
170
+ "version": "3.9.7"
171
+ }
172
+ },
173
+ "nbformat": 4,
174
+ "nbformat_minor": 2
175
+ }
demos_musicgen_demo.ipynb ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# MusicGen\n",
8
+ "Welcome to MusicGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use MusicGen in different settings.\n",
9
+ "\n",
10
+ "First, we start by initializing MusicGen, you can choose a model from the following selection:\n",
11
+ "1. `facebook/musicgen-small` - 300M transformer decoder.\n",
12
+ "2. `facebook/musicgen-medium` - 1.5B transformer decoder.\n",
13
+ "3. `facebook/musicgen-melody` - 1.5B transformer decoder also supporting melody conditioning.\n",
14
+ "4. `facebook/musicgen-large` - 3.3B transformer decoder.\n",
15
+ "\n",
16
+ "We will use the `facebook/musicgen-small` variant for the purpose of this demonstration."
17
+ ]
18
+ },
19
+ {
20
+ "cell_type": "code",
21
+ "execution_count": 1,
22
+ "metadata": {},
23
+ "outputs": [],
24
+ "source": [
25
+ "from audiocraft.models import MusicGen\n",
26
+ "from audiocraft.models import MultiBandDiffusion\n",
27
+ "\n",
28
+ "USE_DIFFUSION_DECODER = False\n",
29
+ "# Using small model, better results would be obtained with `medium` or `large`.\n",
30
+ "model = MusicGen.get_pretrained('facebook/musicgen-small')\n",
31
+ "if USE_DIFFUSION_DECODER:\n",
32
+ " mbd = MultiBandDiffusion.get_mbd_musicgen()"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "markdown",
37
+ "metadata": {},
38
+ "source": [
39
+ "Next, let us configure the generation parameters. Specifically, you can control the following:\n",
40
+ "* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.\n",
41
+ "* `top_k` (int, optional): top_k used for sampling. Defaults to 250.\n",
42
+ "* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.\n",
43
+ "* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.\n",
44
+ "* `duration` (float, optional): duration of the generated waveform. Defaults to 30.0.\n",
45
+ "* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.\n",
46
+ "\n",
47
+ "When left unchanged, MusicGen will revert to its default parameters."
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "code",
52
+ "execution_count": null,
53
+ "metadata": {},
54
+ "outputs": [],
55
+ "source": [
56
+ "model.set_generation_params(\n",
57
+ " use_sampling=True,\n",
58
+ " top_k=250,\n",
59
+ " duration=30\n",
60
+ ")"
61
+ ]
62
+ },
63
+ {
64
+ "cell_type": "markdown",
65
+ "metadata": {},
66
+ "source": [
67
+ "Next, we can go ahead and start generating music using one of the following modes:\n",
68
+ "* Unconditional samples using `model.generate_unconditional`\n",
69
+ "* Music continuation using `model.generate_continuation`\n",
70
+ "* Text-conditional samples using `model.generate`\n",
71
+ "* Melody-conditional samples using `model.generate_with_chroma`"
72
+ ]
73
+ },
74
+ {
75
+ "cell_type": "markdown",
76
+ "metadata": {},
77
+ "source": [
78
+ "### Music Continuation"
79
+ ]
80
+ },
81
+ {
82
+ "cell_type": "code",
83
+ "execution_count": null,
84
+ "metadata": {},
85
+ "outputs": [],
86
+ "source": [
87
+ "import math\n",
88
+ "import torchaudio\n",
89
+ "import torch\n",
90
+ "from audiocraft.utils.notebook import display_audio\n",
91
+ "\n",
92
+ "def get_bip_bip(bip_duration=0.125, frequency=440,\n",
93
+ " duration=0.5, sample_rate=32000, device=\"cuda\"):\n",
94
+ " \"\"\"Generates a series of bip bip at the given frequency.\"\"\"\n",
95
+ " t = torch.arange(\n",
96
+ " int(duration * sample_rate), device=\"cuda\", dtype=torch.float) / sample_rate\n",
97
+ " wav = torch.cos(2 * math.pi * 440 * t)[None]\n",
98
+ " tp = (t % (2 * bip_duration)) / (2 * bip_duration)\n",
99
+ " envelope = (tp >= 0.5).float()\n",
100
+ " return wav * envelope"
101
+ ]
102
+ },
103
+ {
104
+ "cell_type": "code",
105
+ "execution_count": null,
106
+ "metadata": {},
107
+ "outputs": [],
108
+ "source": [
109
+ "# Here we use a synthetic signal to prompt both the tonality and the BPM\n",
110
+ "# of the generated audio.\n",
111
+ "res = model.generate_continuation(\n",
112
+ " get_bip_bip(0.125).expand(2, -1, -1), \n",
113
+ " 32000, ['Jazz jazz and only jazz', \n",
114
+ " 'Heartful EDM with beautiful synths and chords'], \n",
115
+ " progress=True)\n",
116
+ "display_audio(res, 32000)"
117
+ ]
118
+ },
119
+ {
120
+ "cell_type": "code",
121
+ "execution_count": null,
122
+ "metadata": {},
123
+ "outputs": [],
124
+ "source": [
125
+ "# You can also use any audio from a file. Make sure to trim the file if it is too long!\n",
126
+ "prompt_waveform, prompt_sr = torchaudio.load(\"../assets/bach.mp3\")\n",
127
+ "prompt_duration = 2\n",
128
+ "prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\n",
129
+ "output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True, return_tokens=True)\n",
130
+ "display_audio(output[0], sample_rate=32000)\n",
131
+ "if USE_DIFFUSION_DECODER:\n",
132
+ " out_diffusion = mbd.tokens_to_wav(output[1])\n",
133
+ " display_audio(out_diffusion, sample_rate=32000)"
134
+ ]
135
+ },
136
+ {
137
+ "cell_type": "markdown",
138
+ "metadata": {},
139
+ "source": [
140
+ "### Text-conditional Generation"
141
+ ]
142
+ },
143
+ {
144
+ "cell_type": "code",
145
+ "execution_count": null,
146
+ "metadata": {},
147
+ "outputs": [],
148
+ "source": [
149
+ "from audiocraft.utils.notebook import display_audio\n",
150
+ "\n",
151
+ "output = model.generate(\n",
152
+ " descriptions=[\n",
153
+ " #'80s pop track with bassy drums and synth',\n",
154
+ " #'90s rock song with loud guitars and heavy drums',\n",
155
+ " #'Progressive rock drum and bass solo',\n",
156
+ " #'Punk Rock song with loud drum and power guitar',\n",
157
+ " #'Bluesy guitar instrumental with soulful licks and a driving rhythm section',\n",
158
+ " #'Jazz Funk song with slap bass and powerful saxophone',\n",
159
+ " 'drum and bass beat with intense percussions'\n",
160
+ " ],\n",
161
+ " progress=True, return_tokens=True\n",
162
+ ")\n",
163
+ "display_audio(output[0], sample_rate=32000)\n",
164
+ "if USE_DIFFUSION_DECODER:\n",
165
+ " out_diffusion = mbd.tokens_to_wav(output[1])\n",
166
+ " display_audio(out_diffusion, sample_rate=32000)"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "markdown",
171
+ "metadata": {},
172
+ "source": [
173
+ "### Melody-conditional Generation"
174
+ ]
175
+ },
176
+ {
177
+ "cell_type": "code",
178
+ "execution_count": null,
179
+ "metadata": {},
180
+ "outputs": [],
181
+ "source": [
182
+ "import torchaudio\n",
183
+ "from audiocraft.utils.notebook import display_audio\n",
184
+ "\n",
185
+ "model = MusicGen.get_pretrained('facebook/musicgen-melody')\n",
186
+ "model.set_generation_params(duration=8)\n",
187
+ "\n",
188
+ "melody_waveform, sr = torchaudio.load(\"../assets/bach.mp3\")\n",
189
+ "melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)\n",
190
+ "output = model.generate_with_chroma(\n",
191
+ " descriptions=[\n",
192
+ " '80s pop track with bassy drums and synth',\n",
193
+ " '90s rock song with loud guitars and heavy drums',\n",
194
+ " ],\n",
195
+ " melody_wavs=melody_waveform,\n",
196
+ " melody_sample_rate=sr,\n",
197
+ " progress=True, return_tokens=True\n",
198
+ ")\n",
199
+ "display_audio(output[0], sample_rate=32000)\n",
200
+ "if USE_DIFFUSION_DECODER:\n",
201
+ " out_diffusion = mbd.tokens_to_wav(output[1])\n",
202
+ " display_audio(out_diffusion, sample_rate=32000)"
203
+ ]
204
+ }
205
+ ],
206
+ "metadata": {
207
+ "kernelspec": {
208
+ "display_name": "Python 3 (ipykernel)",
209
+ "language": "python",
210
+ "name": "python3"
211
+ },
212
+ "language_info": {
213
+ "codemirror_mode": {
214
+ "name": "ipython",
215
+ "version": 3
216
+ },
217
+ "file_extension": ".py",
218
+ "mimetype": "text/x-python",
219
+ "name": "python",
220
+ "nbconvert_exporter": "python",
221
+ "pygments_lexer": "ipython3",
222
+ "version": "3.9.16"
223
+ },
224
+ "vscode": {
225
+ "interpreter": {
226
+ "hash": "b02c911f9b3627d505ea4a19966a915ef21f28afb50dbf6b2115072d27c69103"
227
+ }
228
+ }
229
+ },
230
+ "nbformat": 4,
231
+ "nbformat_minor": 2
232
+ }