2002 Turkish pop, a style of energetic, danceable songs Take artists like Ajda Pekkan, Tarkan, Sertab Erener, Mustafa Sandal as an example and compose a song

#20
This view is limited to 50 files because it contains too many changes.  See the raw diff here.
Files changed (50) hide show
  1. .github/actions/audiocraft_build/action.yml +0 -2
  2. .github/workflows/audiocraft_docs.yml +3 -3
  3. .github/workflows/audiocraft_tests.yml +1 -6
  4. .gitignore +1 -8
  5. CHANGELOG.md +1 -34
  6. CONTRIBUTING.md +2 -2
  7. LICENSE_weights +157 -399
  8. MANIFEST.in +0 -7
  9. model_cards/MUSICGEN_MODEL_CARD.md → MODEL_CARD.md +7 -31
  10. Makefile +4 -23
  11. README.md +80 -46
  12. app.py +216 -0
  13. app_batched.py +222 -0
  14. assets/a_duck_quacking_as_birds_chirp_and_a_pigeon_cooing.mp3 +0 -0
  15. assets/sirens_and_a_humming_engine_approach_and_pass.mp3 +0 -0
  16. audiocraft/__init__.py +1 -17
  17. audiocraft/adversarial/__init__.py +0 -22
  18. audiocraft/adversarial/discriminators/__init__.py +0 -10
  19. audiocraft/adversarial/discriminators/base.py +0 -34
  20. audiocraft/adversarial/discriminators/mpd.py +0 -106
  21. audiocraft/adversarial/discriminators/msd.py +0 -126
  22. audiocraft/adversarial/discriminators/msstftd.py +0 -134
  23. audiocraft/adversarial/losses.py +0 -228
  24. audiocraft/data/__init__.py +1 -3
  25. audiocraft/data/audio.py +21 -37
  26. audiocraft/data/audio_dataset.py +31 -93
  27. audiocraft/data/audio_utils.py +10 -12
  28. audiocraft/data/info_audio_dataset.py +0 -110
  29. audiocraft/data/music_dataset.py +0 -270
  30. audiocraft/data/sound_dataset.py +0 -330
  31. audiocraft/data/zip.py +6 -8
  32. audiocraft/environment.py +0 -176
  33. audiocraft/grids/__init__.py +0 -6
  34. audiocraft/grids/_base_explorers.py +0 -80
  35. audiocraft/grids/audiogen/__init__.py +0 -6
  36. audiocraft/grids/audiogen/audiogen_base_16khz.py +0 -23
  37. audiocraft/grids/audiogen/audiogen_pretrained_16khz_eval.py +0 -68
  38. audiocraft/grids/compression/__init__.py +0 -6
  39. audiocraft/grids/compression/_explorers.py +0 -55
  40. audiocraft/grids/compression/debug.py +0 -31
  41. audiocraft/grids/compression/encodec_audiogen_16khz.py +0 -29
  42. audiocraft/grids/compression/encodec_base_24khz.py +0 -28
  43. audiocraft/grids/compression/encodec_musicgen_32khz.py +0 -34
  44. audiocraft/grids/diffusion/4_bands_base_32khz.py +0 -27
  45. audiocraft/grids/diffusion/__init__.py +0 -6
  46. audiocraft/grids/diffusion/_explorers.py +0 -66
  47. audiocraft/grids/musicgen/__init__.py +0 -6
  48. audiocraft/grids/musicgen/_explorers.py +0 -93
  49. audiocraft/grids/musicgen/musicgen_base_32khz.py +0 -43
  50. audiocraft/grids/musicgen/musicgen_base_cached_32khz.py +0 -67
.github/actions/audiocraft_build/action.yml CHANGED
@@ -21,8 +21,6 @@ runs:
21
  python3 -m venv env
22
  . env/bin/activate
23
  python -m pip install --upgrade pip
24
- pip install torch torchvision torchaudio
25
- pip install xformers
26
  pip install -e '.[dev]'
27
  - name: System Dependencies
28
  shell: bash
 
21
  python3 -m venv env
22
  . env/bin/activate
23
  python -m pip install --upgrade pip
 
 
24
  pip install -e '.[dev]'
25
  - name: System Dependencies
26
  shell: bash
.github/workflows/audiocraft_docs.yml CHANGED
@@ -23,9 +23,9 @@ jobs:
23
  - name: Make docs
24
  run: |
25
  . env/bin/activate
26
- make api_docs
27
- git add -f api_docs
28
- git commit -m api_docs
29
 
30
  - name: Push branch
31
  run: |
 
23
  - name: Make docs
24
  run: |
25
  . env/bin/activate
26
+ make docs
27
+ git add -f docs
28
+ git commit -m docs
29
 
30
  - name: Push branch
31
  run: |
.github/workflows/audiocraft_tests.yml CHANGED
@@ -12,11 +12,6 @@ jobs:
12
  steps:
13
  - uses: actions/checkout@v2
14
  - uses: ./.github/actions/audiocraft_build
15
- - name: Run unit tests
16
- run: |
17
  . env/bin/activate
18
  make tests
19
- - name: Run integration tests
20
- run: |
21
- . env/bin/activate
22
- make tests_integ
 
12
  steps:
13
  - uses: actions/checkout@v2
14
  - uses: ./.github/actions/audiocraft_build
15
+ - run: |
 
16
  . env/bin/activate
17
  make tests
 
 
 
 
.gitignore CHANGED
@@ -35,7 +35,7 @@ wheels/
35
  .coverage
36
 
37
  # docs
38
- /api_docs
39
 
40
  # dotenv
41
  .env
@@ -46,13 +46,6 @@ wheels/
46
  venv/
47
  ENV/
48
 
49
- # egs with manifest files
50
- egs/*
51
- !egs/example
52
- # local datasets
53
- dataset/*
54
- !dataset/example
55
-
56
  # personal notebooks & scripts
57
  */local_scripts
58
  */notes
 
35
  .coverage
36
 
37
  # docs
38
+ /docs
39
 
40
  # dotenv
41
  .env
 
46
  venv/
47
  ENV/
48
 
 
 
 
 
 
 
 
49
  # personal notebooks & scripts
50
  */local_scripts
51
  */notes
CHANGELOG.md CHANGED
@@ -4,37 +4,7 @@ All notable changes to this project will be documented in this file.
4
 
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
6
 
7
- ## [1.2.0a] - TBD
8
-
9
- Adding stereo models.
10
-
11
-
12
- ## [1.1.0] - 2023-11-06
13
-
14
- Not using torchaudio anymore when writing audio files, relying instead directly on the commandline ffmpeg. Also not using it anymore for reading audio files, for similar reasons.
15
-
16
- Fixed DAC support with non default number of codebooks.
17
-
18
- Fixed bug when `two_step_cfg` was overriden when calling `generate()`.
19
-
20
- Fixed samples being always prompted with audio, rather than having both prompted and unprompted.
21
-
22
- **Backward incompatible change:** A `torch.no_grad` around the computation of the conditioning made its way in the public release.
23
- The released models were trained without this. Those impact linear layers applied to the output of the T5 or melody conditioners.
24
- We removed it, so you might need to retrain models.
25
-
26
- **Backward incompatible change:** Fixing wrong sample rate in CLAP (WARNING if you trained model with CLAP before).
27
-
28
- **Backward incompatible change:** Renamed VALLEPattern to CoarseFirstPattern, as it was wrongly named. Probably no one
29
- retrained a model with this pattern, so hopefully this won't impact you!
30
-
31
-
32
- ## [1.0.0] - 2023-09-07
33
-
34
- Major revision, added training code for EnCodec, AudioGen, MusicGen, and MultiBandDiffusion.
35
- Added pretrained model for AudioGen and MultiBandDiffusion.
36
-
37
- ## [0.0.2] - 2023-08-01
38
 
39
  Improved demo, fixed top p (thanks @jnordberg).
40
 
@@ -45,9 +15,6 @@ More options when launching Gradio app locally (thanks @ashleykleynhans).
45
 
46
  Testing out PyTorch 2.0 memory efficient attention.
47
 
48
- Added extended generation (infinite length) by slowly moving the windows.
49
- Note that other implementations exist: https://github.com/camenduru/MusicGen-colab.
50
-
51
  ## [0.0.1] - 2023-06-09
52
 
53
  Initial release, with model evaluation only.
 
4
 
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
6
 
7
+ ## [0.0.2a] - TBD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  Improved demo, fixed top p (thanks @jnordberg).
10
 
 
15
 
16
  Testing out PyTorch 2.0 memory efficient attention.
17
 
 
 
 
18
  ## [0.0.1] - 2023-06-09
19
 
20
  Initial release, with model evaluation only.
CONTRIBUTING.md CHANGED
@@ -1,11 +1,11 @@
1
- # Contributing to AudioCraft
2
 
3
  We want to make contributing to this project as easy and transparent as
4
  possible.
5
 
6
  ## Pull Requests
7
 
8
- AudioCraft is the implementation of a research paper.
9
  Therefore, we do not plan on accepting many pull requests for new features.
10
  We certainly welcome them for bug fixes.
11
 
 
1
+ # Contributing to Audiocraft
2
 
3
  We want to make contributing to this project as easy and transparent as
4
  possible.
5
 
6
  ## Pull Requests
7
 
8
+ Audiocraft is the implementation of a research paper.
9
  Therefore, we do not plan on accepting many pull requests for new features.
10
  We certainly welcome them for bug fixes.
11
 
LICENSE_weights CHANGED
@@ -1,399 +1,157 @@
1
- Attribution-NonCommercial 4.0 International
2
-
3
- =======================================================================
4
-
5
- Creative Commons Corporation ("Creative Commons") is not a law firm and
6
- does not provide legal services or legal advice. Distribution of
7
- Creative Commons public licenses does not create a lawyer-client or
8
- other relationship. Creative Commons makes its licenses and related
9
- information available on an "as-is" basis. Creative Commons gives no
10
- warranties regarding its licenses, any material licensed under their
11
- terms and conditions, or any related information. Creative Commons
12
- disclaims all liability for damages resulting from their use to the
13
- fullest extent possible.
14
-
15
- Using Creative Commons Public Licenses
16
-
17
- Creative Commons public licenses provide a standard set of terms and
18
- conditions that creators and other rights holders may use to share
19
- original works of authorship and other material subject to copyright
20
- and certain other rights specified in the public license below. The
21
- following considerations are for informational purposes only, are not
22
- exhaustive, and do not form part of our licenses.
23
-
24
- Considerations for licensors: Our public licenses are
25
- intended for use by those authorized to give the public
26
- permission to use material in ways otherwise restricted by
27
- copyright and certain other rights. Our licenses are
28
- irrevocable. Licensors should read and understand the terms
29
- and conditions of the license they choose before applying it.
30
- Licensors should also secure all rights necessary before
31
- applying our licenses so that the public can reuse the
32
- material as expected. Licensors should clearly mark any
33
- material not subject to the license. This includes other CC-
34
- licensed material, or material used under an exception or
35
- limitation to copyright. More considerations for licensors:
36
- wiki.creativecommons.org/Considerations_for_licensors
37
-
38
- Considerations for the public: By using one of our public
39
- licenses, a licensor grants the public permission to use the
40
- licensed material under specified terms and conditions. If
41
- the licensor's permission is not necessary for any reason--for
42
- example, because of any applicable exception or limitation to
43
- copyright--then that use is not regulated by the license. Our
44
- licenses grant only permissions under copyright and certain
45
- other rights that a licensor has authority to grant. Use of
46
- the licensed material may still be restricted for other
47
- reasons, including because others have copyright or other
48
- rights in the material. A licensor may make special requests,
49
- such as asking that all changes be marked or described.
50
- Although not required by our licenses, you are encouraged to
51
- respect those requests where reasonable. More_considerations
52
- for the public:
53
- wiki.creativecommons.org/Considerations_for_licensees
54
-
55
- =======================================================================
56
-
57
- Creative Commons Attribution-NonCommercial 4.0 International Public
58
- License
59
-
60
- By exercising the Licensed Rights (defined below), You accept and agree
61
- to be bound by the terms and conditions of this Creative Commons
62
- Attribution-NonCommercial 4.0 International Public License ("Public
63
- License"). To the extent this Public License may be interpreted as a
64
- contract, You are granted the Licensed Rights in consideration of Your
65
- acceptance of these terms and conditions, and the Licensor grants You
66
- such rights in consideration of benefits the Licensor receives from
67
- making the Licensed Material available under these terms and
68
- conditions.
69
-
70
- Section 1 -- Definitions.
71
-
72
- a. Adapted Material means material subject to Copyright and Similar
73
- Rights that is derived from or based upon the Licensed Material
74
- and in which the Licensed Material is translated, altered,
75
- arranged, transformed, or otherwise modified in a manner requiring
76
- permission under the Copyright and Similar Rights held by the
77
- Licensor. For purposes of this Public License, where the Licensed
78
- Material is a musical work, performance, or sound recording,
79
- Adapted Material is always produced where the Licensed Material is
80
- synched in timed relation with a moving image.
81
-
82
- b. Adapter's License means the license You apply to Your Copyright
83
- and Similar Rights in Your contributions to Adapted Material in
84
- accordance with the terms and conditions of this Public License.
85
-
86
- c. Copyright and Similar Rights means copyright and/or similar rights
87
- closely related to copyright including, without limitation,
88
- performance, broadcast, sound recording, and Sui Generis Database
89
- Rights, without regard to how the rights are labeled or
90
- categorized. For purposes of this Public License, the rights
91
- specified in Section 2(b)(1)-(2) are not Copyright and Similar
92
- Rights.
93
- d. Effective Technological Measures means those measures that, in the
94
- absence of proper authority, may not be circumvented under laws
95
- fulfilling obligations under Article 11 of the WIPO Copyright
96
- Treaty adopted on December 20, 1996, and/or similar international
97
- agreements.
98
-
99
- e. Exceptions and Limitations means fair use, fair dealing, and/or
100
- any other exception or limitation to Copyright and Similar Rights
101
- that applies to Your use of the Licensed Material.
102
-
103
- f. Licensed Material means the artistic or literary work, database,
104
- or other material to which the Licensor applied this Public
105
- License.
106
-
107
- g. Licensed Rights means the rights granted to You subject to the
108
- terms and conditions of this Public License, which are limited to
109
- all Copyright and Similar Rights that apply to Your use of the
110
- Licensed Material and that the Licensor has authority to license.
111
-
112
- h. Licensor means the individual(s) or entity(ies) granting rights
113
- under this Public License.
114
-
115
- i. NonCommercial means not primarily intended for or directed towards
116
- commercial advantage or monetary compensation. For purposes of
117
- this Public License, the exchange of the Licensed Material for
118
- other material subject to Copyright and Similar Rights by digital
119
- file-sharing or similar means is NonCommercial provided there is
120
- no payment of monetary compensation in connection with the
121
- exchange.
122
-
123
- j. Share means to provide material to the public by any means or
124
- process that requires permission under the Licensed Rights, such
125
- as reproduction, public display, public performance, distribution,
126
- dissemination, communication, or importation, and to make material
127
- available to the public including in ways that members of the
128
- public may access the material from a place and at a time
129
- individually chosen by them.
130
-
131
- k. Sui Generis Database Rights means rights other than copyright
132
- resulting from Directive 96/9/EC of the European Parliament and of
133
- the Council of 11 March 1996 on the legal protection of databases,
134
- as amended and/or succeeded, as well as other essentially
135
- equivalent rights anywhere in the world.
136
-
137
- l. You means the individual or entity exercising the Licensed Rights
138
- under this Public License. Your has a corresponding meaning.
139
-
140
- Section 2 -- Scope.
141
-
142
- a. License grant.
143
-
144
- 1. Subject to the terms and conditions of this Public License,
145
- the Licensor hereby grants You a worldwide, royalty-free,
146
- non-sublicensable, non-exclusive, irrevocable license to
147
- exercise the Licensed Rights in the Licensed Material to:
148
-
149
- a. reproduce and Share the Licensed Material, in whole or
150
- in part, for NonCommercial purposes only; and
151
-
152
- b. produce, reproduce, and Share Adapted Material for
153
- NonCommercial purposes only.
154
-
155
- 2. Exceptions and Limitations. For the avoidance of doubt, where
156
- Exceptions and Limitations apply to Your use, this Public
157
- License does not apply, and You do not need to comply with
158
- its terms and conditions.
159
-
160
- 3. Term. The term of this Public License is specified in Section
161
- 6(a).
162
-
163
- 4. Media and formats; technical modifications allowed. The
164
- Licensor authorizes You to exercise the Licensed Rights in
165
- all media and formats whether now known or hereafter created,
166
- and to make technical modifications necessary to do so. The
167
- Licensor waives and/or agrees not to assert any right or
168
- authority to forbid You from making technical modifications
169
- necessary to exercise the Licensed Rights, including
170
- technical modifications necessary to circumvent Effective
171
- Technological Measures. For purposes of this Public License,
172
- simply making modifications authorized by this Section 2(a)
173
- (4) never produces Adapted Material.
174
-
175
- 5. Downstream recipients.
176
-
177
- a. Offer from the Licensor -- Licensed Material. Every
178
- recipient of the Licensed Material automatically
179
- receives an offer from the Licensor to exercise the
180
- Licensed Rights under the terms and conditions of this
181
- Public License.
182
-
183
- b. No downstream restrictions. You may not offer or impose
184
- any additional or different terms or conditions on, or
185
- apply any Effective Technological Measures to, the
186
- Licensed Material if doing so restricts exercise of the
187
- Licensed Rights by any recipient of the Licensed
188
- Material.
189
-
190
- 6. No endorsement. Nothing in this Public License constitutes or
191
- may be construed as permission to assert or imply that You
192
- are, or that Your use of the Licensed Material is, connected
193
- with, or sponsored, endorsed, or granted official status by,
194
- the Licensor or others designated to receive attribution as
195
- provided in Section 3(a)(1)(A)(i).
196
-
197
- b. Other rights.
198
-
199
- 1. Moral rights, such as the right of integrity, are not
200
- licensed under this Public License, nor are publicity,
201
- privacy, and/or other similar personality rights; however, to
202
- the extent possible, the Licensor waives and/or agrees not to
203
- assert any such rights held by the Licensor to the limited
204
- extent necessary to allow You to exercise the Licensed
205
- Rights, but not otherwise.
206
-
207
- 2. Patent and trademark rights are not licensed under this
208
- Public License.
209
-
210
- 3. To the extent possible, the Licensor waives any right to
211
- collect royalties from You for the exercise of the Licensed
212
- Rights, whether directly or through a collecting society
213
- under any voluntary or waivable statutory or compulsory
214
- licensing scheme. In all other cases the Licensor expressly
215
- reserves any right to collect such royalties, including when
216
- the Licensed Material is used other than for NonCommercial
217
- purposes.
218
-
219
- Section 3 -- License Conditions.
220
-
221
- Your exercise of the Licensed Rights is expressly made subject to the
222
- following conditions.
223
-
224
- a. Attribution.
225
-
226
- 1. If You Share the Licensed Material (including in modified
227
- form), You must:
228
-
229
- a. retain the following if it is supplied by the Licensor
230
- with the Licensed Material:
231
-
232
- i. identification of the creator(s) of the Licensed
233
- Material and any others designated to receive
234
- attribution, in any reasonable manner requested by
235
- the Licensor (including by pseudonym if
236
- designated);
237
-
238
- ii. a copyright notice;
239
-
240
- iii. a notice that refers to this Public License;
241
-
242
- iv. a notice that refers to the disclaimer of
243
- warranties;
244
-
245
- v. a URI or hyperlink to the Licensed Material to the
246
- extent reasonably practicable;
247
-
248
- b. indicate if You modified the Licensed Material and
249
- retain an indication of any previous modifications; and
250
-
251
- c. indicate the Licensed Material is licensed under this
252
- Public License, and include the text of, or the URI or
253
- hyperlink to, this Public License.
254
-
255
- 2. You may satisfy the conditions in Section 3(a)(1) in any
256
- reasonable manner based on the medium, means, and context in
257
- which You Share the Licensed Material. For example, it may be
258
- reasonable to satisfy the conditions by providing a URI or
259
- hyperlink to a resource that includes the required
260
- information.
261
-
262
- 3. If requested by the Licensor, You must remove any of the
263
- information required by Section 3(a)(1)(A) to the extent
264
- reasonably practicable.
265
-
266
- 4. If You Share Adapted Material You produce, the Adapter's
267
- License You apply must not prevent recipients of the Adapted
268
- Material from complying with this Public License.
269
-
270
- Section 4 -- Sui Generis Database Rights.
271
-
272
- Where the Licensed Rights include Sui Generis Database Rights that
273
- apply to Your use of the Licensed Material:
274
-
275
- a. for the avoidance of doubt, Section 2(a)(1) grants You the right
276
- to extract, reuse, reproduce, and Share all or a substantial
277
- portion of the contents of the database for NonCommercial purposes
278
- only;
279
-
280
- b. if You include all or a substantial portion of the database
281
- contents in a database in which You have Sui Generis Database
282
- Rights, then the database in which You have Sui Generis Database
283
- Rights (but not its individual contents) is Adapted Material; and
284
-
285
- c. You must comply with the conditions in Section 3(a) if You Share
286
- all or a substantial portion of the contents of the database.
287
-
288
- For the avoidance of doubt, this Section 4 supplements and does not
289
- replace Your obligations under this Public License where the Licensed
290
- Rights include other Copyright and Similar Rights.
291
-
292
- Section 5 -- Disclaimer of Warranties and Limitation of Liability.
293
-
294
- a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
295
- EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
296
- AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
297
- ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
298
- IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
299
- WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
300
- PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
301
- ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
302
- KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
303
- ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
304
-
305
- b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
306
- TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
307
- NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
308
- INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
309
- COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
310
- USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
311
- ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
312
- DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
313
- IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
314
-
315
- c. The disclaimer of warranties and limitation of liability provided
316
- above shall be interpreted in a manner that, to the extent
317
- possible, most closely approximates an absolute disclaimer and
318
- waiver of all liability.
319
-
320
- Section 6 -- Term and Termination.
321
-
322
- a. This Public License applies for the term of the Copyright and
323
- Similar Rights licensed here. However, if You fail to comply with
324
- this Public License, then Your rights under this Public License
325
- terminate automatically.
326
-
327
- b. Where Your right to use the Licensed Material has terminated under
328
- Section 6(a), it reinstates:
329
-
330
- 1. automatically as of the date the violation is cured, provided
331
- it is cured within 30 days of Your discovery of the
332
- violation; or
333
-
334
- 2. upon express reinstatement by the Licensor.
335
-
336
- For the avoidance of doubt, this Section 6(b) does not affect any
337
- right the Licensor may have to seek remedies for Your violations
338
- of this Public License.
339
-
340
- c. For the avoidance of doubt, the Licensor may also offer the
341
- Licensed Material under separate terms or conditions or stop
342
- distributing the Licensed Material at any time; however, doing so
343
- will not terminate this Public License.
344
-
345
- d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
346
- License.
347
-
348
- Section 7 -- Other Terms and Conditions.
349
-
350
- a. The Licensor shall not be bound by any additional or different
351
- terms or conditions communicated by You unless expressly agreed.
352
-
353
- b. Any arrangements, understandings, or agreements regarding the
354
- Licensed Material not stated herein are separate from and
355
- independent of the terms and conditions of this Public License.
356
-
357
- Section 8 -- Interpretation.
358
-
359
- a. For the avoidance of doubt, this Public License does not, and
360
- shall not be interpreted to, reduce, limit, restrict, or impose
361
- conditions on any use of the Licensed Material that could lawfully
362
- be made without permission under this Public License.
363
-
364
- b. To the extent possible, if any provision of this Public License is
365
- deemed unenforceable, it shall be automatically reformed to the
366
- minimum extent necessary to make it enforceable. If the provision
367
- cannot be reformed, it shall be severed from this Public License
368
- without affecting the enforceability of the remaining terms and
369
- conditions.
370
-
371
- c. No term or condition of this Public License will be waived and no
372
- failure to comply consented to unless expressly agreed to by the
373
- Licensor.
374
-
375
- d. Nothing in this Public License constitutes or may be interpreted
376
- as a limitation upon, or waiver of, any privileges and immunities
377
- that apply to the Licensor or You, including from the legal
378
- processes of any jurisdiction or authority.
379
-
380
- =======================================================================
381
-
382
- Creative Commons is not a party to its public
383
- licenses. Notwithstanding, Creative Commons may elect to apply one of
384
- its public licenses to material it publishes and in those instances
385
- will be considered the “Licensor.” The text of the Creative Commons
386
- public licenses is dedicated to the public domain under the CC0 Public
387
- Domain Dedication. Except for the limited purpose of indicating that
388
- material is shared under a Creative Commons public license or as
389
- otherwise permitted by the Creative Commons policies published at
390
- creativecommons.org/policies, Creative Commons does not authorize the
391
- use of the trademark "Creative Commons" or any other trademark or logo
392
- of Creative Commons without its prior written consent including,
393
- without limitation, in connection with any unauthorized modifications
394
- to any of its public licenses or any other arrangements,
395
- understandings, or agreements concerning use of licensed material. For
396
- the avoidance of doubt, this paragraph does not form part of the
397
- public licenses.
398
-
399
- Creative Commons may be contacted at creativecommons.org.
 
1
+ # Attribution-NonCommercial-NoDerivatives 4.0 International
2
+
3
+ > *Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.*
4
+ >
5
+ > ### Using Creative Commons Public Licenses
6
+ >
7
+ > Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
8
+ >
9
+ > * __Considerations for licensors:__ Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. [More considerations for licensors](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensors).
10
+ >
11
+ > * __Considerations for the public:__ By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. [More considerations for the public](http://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees).
12
+
13
+ ## Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
14
+
15
+ By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
16
+
17
+ ### Section 1 Definitions.
18
+
19
+ a. __Adapted Material__ means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
20
+
21
+ b. __Copyright and Similar Rights__ means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
22
+
23
+ e. __Effective Technological Measures__ means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
24
+
25
+ f. __Exceptions and Limitations__ means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
26
+
27
+ h. __Licensed Material__ means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
28
+
29
+ i. __Licensed Rights__ means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
30
+
31
+ h. __Licensor__ means the individual(s) or entity(ies) granting rights under this Public License.
32
+
33
+ i. __NonCommercial__ means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.
34
+
35
+ j. __Share__ means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
36
+
37
+ k. __Sui Generis Database Rights__ means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
38
+
39
+ l. __You__ means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
40
+
41
+ ### Section 2 Scope.
42
+
43
+ a. ___License grant.___
44
+
45
+ 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
46
+
47
+ A. reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and
48
+
49
+ B. produce and reproduce, but not Share, Adapted Material for NonCommercial purposes only.
50
+
51
+ 2. __Exceptions and Limitations.__ For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
52
+
53
+ 3. __Term.__ The term of this Public License is specified in Section 6(a).
54
+
55
+ 4. __Media and formats; technical modifications allowed.__ The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
56
+
57
+ 5. __Downstream recipients.__
58
+
59
+ A. __Offer from the Licensor – Licensed Material.__ Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
60
+
61
+ B. __No downstream restrictions.__ You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
62
+
63
+ 6. __No endorsement.__ Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
64
+
65
+ b. ___Other rights.___
66
+
67
+ 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
68
+
69
+ 2. Patent and trademark rights are not licensed under this Public License.
70
+
71
+ 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes.
72
+
73
+ ### Section 3 License Conditions.
74
+
75
+ Your exercise of the Licensed Rights is expressly made subject to the following conditions.
76
+
77
+ a. ___Attribution.___
78
+
79
+ 1. If You Share the Licensed Material, You must:
80
+
81
+ A. retain the following if it is supplied by the Licensor with the Licensed Material:
82
+
83
+ i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
84
+
85
+ ii. a copyright notice;
86
+
87
+ iii. a notice that refers to this Public License;
88
+
89
+ iv. a notice that refers to the disclaimer of warranties;
90
+
91
+ v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
92
+
93
+ B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
94
+
95
+ C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
96
+
97
+ For the avoidance of doubt, You do not have permission under this Public License to Share Adapted Material.
98
+
99
+ 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
100
+
101
+ 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
102
+
103
+ ### Section 4 Sui Generis Database Rights.
104
+
105
+ Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
106
+
107
+ a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only and provided You do not Share Adapted Material;
108
+
109
+ b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
110
+
111
+ c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
112
+
113
+ For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
114
+
115
+ ### Section 5 Disclaimer of Warranties and Limitation of Liability.
116
+
117
+ a. __Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.__
118
+
119
+ b. __To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.__
120
+
121
+ c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
122
+
123
+ ### Section 6 Term and Termination.
124
+
125
+ a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
126
+
127
+ b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
128
+
129
+ 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
130
+
131
+ 2. upon express reinstatement by the Licensor.
132
+
133
+ For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
134
+
135
+ c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
136
+
137
+ d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
138
+
139
+ ### Section 7 – Other Terms and Conditions.
140
+
141
+ a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
142
+
143
+ b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
144
+
145
+ ### Section 8 Interpretation.
146
+
147
+ a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
148
+
149
+ b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
150
+
151
+ c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
152
+
153
+ d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
154
+
155
+ > Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at [creativecommons.org/policies](http://creativecommons.org/policies), Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
156
+ >
157
+ > Creative Commons may be contacted at [creativecommons.org](http://creativecommons.org).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
MANIFEST.in CHANGED
@@ -6,10 +6,3 @@ include *.ini
6
  include requirements.txt
7
  include audiocraft/py.typed
8
  include assets/*.mp3
9
- include datasets/*.mp3
10
- recursive-include config *.yaml
11
- recursive-include demos *.py
12
- recursive-include demos *.ipynb
13
- recursive-include scripts *.py
14
- recursive-include model_cards *.md
15
- recursive-include docs *.md
 
6
  include requirements.txt
7
  include audiocraft/py.typed
8
  include assets/*.mp3
 
 
 
 
 
 
 
model_cards/MUSICGEN_MODEL_CARD.md → MODEL_CARD.md RENAMED
@@ -12,11 +12,11 @@
12
 
13
  **Paper or resources for more information:** More information can be found in the paper [Simple and Controllable Music Generation][arxiv].
14
 
15
- **Citation details:** See [our paper][arxiv]
16
 
17
- **License:** Code is released under MIT, model weights are released under CC-BY-NC 4.0.
18
 
19
- **Where to send questions or comments about the model:** Questions and comments about MusicGen can be sent via the [GitHub repository](https://github.com/facebookresearch/audiocraft) of the project, or by opening an issue.
20
 
21
  ## Intended use
22
  **Primary intended use:** The primary use of MusicGen is research on AI-based music generation, including:
@@ -26,7 +26,7 @@
26
 
27
  **Primary intended users:** The primary intended users of the model are researchers in audio, machine learning and artificial intelligence, as well as amateur seeking to better understand those models.
28
 
29
- **Out-of-scope use cases:** The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
30
 
31
  ## Metrics
32
 
@@ -54,24 +54,15 @@ The model was evaluated on the [MusicCaps benchmark](https://www.kaggle.com/data
54
 
55
  The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
56
 
57
- ## Evaluation results
58
 
59
- Below are the objective metrics obtained on MusicCaps with the released model. Note that for the publicly released models, we had all the datasets go through a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs), in order to keep only the instrumental part. This explains the difference in objective metrics with the models used in the paper.
60
-
61
- | Model | Frechet Audio Distance | KLD | Text Consistency | Chroma Cosine Similarity |
62
- |---|---|---|---|---|
63
- | facebook/musicgen-small | 4.88 | 1.42 | 0.27 | - |
64
- | facebook/musicgen-medium | 5.14 | 1.38 | 0.28 | - |
65
- | facebook/musicgen-large | 5.48 | 1.37 | 0.28 | - |
66
- | facebook/musicgen-melody | 4.93 | 1.41 | 0.27 | 0.44 |
67
-
68
- More information can be found in the paper [Simple and Controllable Music Generation][arxiv], in the Results section.
69
 
70
  ## Limitations and biases
71
 
72
  **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
73
 
74
- **Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
75
 
76
  **Limitations:**
77
 
@@ -87,19 +78,4 @@ More information can be found in the paper [Simple and Controllable Music Genera
87
 
88
  **Use cases:** Users must be aware of the biases, limitations and risks of the model. MusicGen is a model developed for artificial intelligence research on controllable music generation. As such, it should not be used for downstream applications without further investigation and mitigation of risks.
89
 
90
- ## Update: stereo models and large melody.
91
-
92
- We further release a set of stereophonic capable models. Those were fine tuned for 200k updates starting
93
- from the mono models. The training data is otherwise identical and capabilities and limitations are shared with the base modes. The stereo models work by getting 2 streams of tokens from the EnCodec model, and interleaving those using
94
- the delay pattern. We also release a mono large model with melody conditioning capabilities. The list of new models
95
- is as follow:
96
-
97
- - facebook/musicgen-stereo-small
98
- - facebook/musicgen-stereo-medium
99
- - facebook/musicgen-stereo-large
100
- - facebook/musicgen-stereo-melody
101
- - facebook/musicgen-melody-large
102
- - facebook/musicgen-stereo-melody-large
103
-
104
-
105
  [arxiv]: https://arxiv.org/abs/2306.05284
 
12
 
13
  **Paper or resources for more information:** More information can be found in the paper [Simple and Controllable Music Generation][arxiv].
14
 
15
+ **Citation details** See [our paper][arxiv]
16
 
17
+ **License** Code is released under MIT, model weights are released under CC-BY-NC 4.0.
18
 
19
+ **Where to send questions or comments about the model:** Questions and comments about MusicGen can be sent via the [Github repository](https://github.com/facebookresearch/audiocraft) of the project, or by opening an issue.
20
 
21
  ## Intended use
22
  **Primary intended use:** The primary use of MusicGen is research on AI-based music generation, including:
 
26
 
27
  **Primary intended users:** The primary intended users of the model are researchers in audio, machine learning and artificial intelligence, as well as amateur seeking to better understand those models.
28
 
29
+ **Out-of-scope use cases** The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
30
 
31
  ## Metrics
32
 
 
54
 
55
  The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
56
 
57
+ ## Quantitative analysis
58
 
59
+ More information can be found in the paper [Simple and Controllable Music Generation][arxiv], in the Experimental Setup section.
 
 
 
 
 
 
 
 
 
60
 
61
  ## Limitations and biases
62
 
63
  **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
64
 
65
+ **Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
66
 
67
  **Limitations:**
68
 
 
78
 
79
  **Use cases:** Users must be aware of the biases, limitations and risks of the model. MusicGen is a model developed for artificial intelligence research on controllable music generation. As such, it should not be used for downstream applications without further investigation and mitigation of risks.
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  [arxiv]: https://arxiv.org/abs/2306.05284
Makefile CHANGED
@@ -1,15 +1,3 @@
1
- INTEG=AUDIOCRAFT_DORA_DIR="/tmp/magma_$(USER)" python3 -m dora -v run --clear device=cpu dataset.num_workers=0 optim.epochs=1 \
2
- dataset.train.num_samples=10 dataset.valid.num_samples=10 \
3
- dataset.evaluate.num_samples=10 dataset.generate.num_samples=2 sample_rate=16000 \
4
- logging.level=DEBUG
5
- INTEG_COMPRESSION = $(INTEG) solver=compression/debug rvq.n_q=2 rvq.bins=48 checkpoint.save_last=true # SIG is 5091833e
6
- INTEG_MUSICGEN = $(INTEG) solver=musicgen/debug dset=audio/example compression_model_checkpoint=//sig/5091833e \
7
- transformer_lm.n_q=2 transformer_lm.card=48 transformer_lm.dim=16 checkpoint.save_last=false # Using compression model from 5091833e
8
- INTEG_AUDIOGEN = $(INTEG) solver=audiogen/debug dset=audio/example compression_model_checkpoint=//sig/5091833e \
9
- transformer_lm.n_q=2 transformer_lm.card=48 transformer_lm.dim=16 checkpoint.save_last=false # Using compression model from 5091833e
10
- INTEG_MBD = $(INTEG) solver=diffusion/debug dset=audio/example \
11
- checkpoint.save_last=false # Using compression model from 616d7b3c
12
-
13
  default: linter tests
14
 
15
  install:
@@ -22,19 +10,12 @@ linter:
22
 
23
  tests:
24
  coverage run -m pytest tests
25
- coverage report
26
-
27
- tests_integ:
28
- $(INTEG_COMPRESSION)
29
- $(INTEG_MBD)
30
- $(INTEG_MUSICGEN)
31
- $(INTEG_AUDIOGEN)
32
-
33
 
34
- api_docs:
35
- pdoc3 --html -o api_docs -f audiocraft
36
 
37
  dist:
38
  python setup.py sdist
39
 
40
- .PHONY: linter tests api_docs dist
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  default: linter tests
2
 
3
  install:
 
10
 
11
  tests:
12
  coverage run -m pytest tests
13
+ coverage report --include 'audiocraft/*'
 
 
 
 
 
 
 
14
 
15
+ docs:
16
+ pdoc3 --html -o docs -f audiocraft
17
 
18
  dist:
19
  python setup.py sdist
20
 
21
+ .PHONY: linter tests docs dist
README.md CHANGED
@@ -5,27 +5,42 @@ tags:
5
  - "music generation"
6
  - "language models"
7
  - "LLMs"
8
- app_file: "demos/musicgen_app.py"
9
  emoji: 🎵
10
- colorFrom: gray
11
  colorTo: blue
12
  sdk: gradio
13
  sdk_version: 3.34.0
14
  pinned: true
15
  license: "cc-by-nc-4.0"
16
- disable_embedding: true
17
  ---
18
- # AudioCraft
19
  ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
20
  ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
21
  ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
22
 
23
- AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code
24
- for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Installation
28
- AudioCraft requires Python 3.9, PyTorch 2.0.0. To install AudioCraft, you can run the following:
29
 
30
  ```shell
31
  # Best to make sure you have torch installed first, in particular before installing xformers.
@@ -34,68 +49,87 @@ pip install 'torch>=2.0'
34
  # Then proceed to one of the following
35
  pip install -U audiocraft # stable release
36
  pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
37
- pip install -e . # or if you cloned the repo locally (mandatory if you want to train).
38
  ```
39
 
40
- We also recommend having `ffmpeg` installed, either through your system or Anaconda:
41
- ```bash
42
- sudo apt-get install ffmpeg
43
- # Or if you are using Anaconda or Miniconda
44
- conda install "ffmpeg<5" -c conda-forge
45
- ```
 
46
 
47
- ## Models
48
 
49
- At the moment, AudioCraft contains the training code and inference code for:
50
- * [MusicGen](./docs/MUSICGEN.md): A state-of-the-art controllable text-to-music model.
51
- * [AudioGen](./docs/AUDIOGEN.md): A state-of-the-art text-to-sound model.
52
- * [EnCodec](./docs/ENCODEC.md): A state-of-the-art high fidelity neural audio codec.
53
- * [Multi Band Diffusion](./docs/MBD.md): An EnCodec compatible decoder using diffusion.
54
 
55
- ## Training code
 
 
56
 
57
- AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models.
58
- For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to
59
- the [AudioCraft training documentation](./docs/TRAINING.md).
 
 
60
 
61
- For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model
62
- that provides pointers to configuration, example grids and model/task-specific information and FAQ.
63
 
 
 
 
 
64
 
65
- ## API documentation
 
 
 
 
66
 
67
- We provide some [API documentation](https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html) for AudioCraft.
 
 
68
 
 
 
 
 
69
 
70
- ## FAQ
71
 
72
- #### Is the training code available?
73
 
74
- Yes! We provide the training code for [EnCodec](./docs/ENCODEC.md), [MusicGen](./docs/MUSICGEN.md) and [Multi Band Diffusion](./docs/MBD.md).
75
 
76
- #### Where are the models stored?
77
 
78
- Hugging Face stored the model in a specific location, which can be overriden by setting the `AUDIOCRAFT_CACHE_DIR` environment variable for the AudioCraft models.
79
- In order to change the cache location of the other Hugging Face models, please check out the [Hugging Face Transformers documentation for the cache setup](https://huggingface.co/docs/transformers/installation#cache-setup).
80
- Finally, if you use a model that relies on Demucs (e.g. `musicgen-melody`) and want to change the download location for Demucs, refer to the [Torch Hub documentation](https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved).
81
 
 
82
 
83
- ## License
84
- * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
85
- * The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
86
 
 
 
 
87
 
88
- ## Citation
89
 
90
- For the general framework of AudioCraft, please cite the following.
91
  ```
92
  @article{copet2023simple,
93
- title={Simple and Controllable Music Generation},
94
- author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
95
- year={2023},
96
- journal={arXiv preprint arXiv:2306.05284},
97
  }
98
  ```
99
 
100
- When referring to a specific model, please cite as mentioned in the model specific README, e.g
101
- [./docs/MUSICGEN.md](./docs/MUSICGEN.md), [./docs/AUDIOGEN.md](./docs/AUDIOGEN.md), etc.
 
 
 
 
 
5
  - "music generation"
6
  - "language models"
7
  - "LLMs"
8
+ app_file: "app_batched.py"
9
  emoji: 🎵
10
+ colorFrom: white
11
  colorTo: blue
12
  sdk: gradio
13
  sdk_version: 3.34.0
14
  pinned: true
15
  license: "cc-by-nc-4.0"
 
16
  ---
17
+ # Audiocraft
18
  ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
19
  ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
20
  ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
21
 
22
+ Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
 
23
 
24
+ ## MusicGen
25
+
26
+ Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
27
+ Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
28
+ all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
29
+ them in parallel, thus having only 50 auto-regressive steps per second of audio.
30
+ Check out our [sample page][musicgen_samples] or test the available demo!
31
+
32
+ <a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
33
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
34
+ </a>
35
+ <a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
36
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
37
+ </a>
38
+ <br>
39
+
40
+ We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
41
 
42
  ## Installation
43
+ Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
44
 
45
  ```shell
46
  # Best to make sure you have torch installed first, in particular before installing xformers.
 
49
  # Then proceed to one of the following
50
  pip install -U audiocraft # stable release
51
  pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
52
+ pip install -e . # or if you cloned the repo locally
53
  ```
54
 
55
+ ## Usage
56
+ We offer a number of way to interact with MusicGen:
57
+ 1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing).
58
+ 2. You can use the gradio demo locally by running `python app.py`.
59
+ 3. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
60
+ 4. Finally, you can run the [Gradio demo with a Colab GPU](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing),
61
+ as adapted from [@camenduru Colab](https://github.com/camenduru/MusicGen-colab).
62
 
63
+ ## API
64
 
65
+ We provide a simple API and 4 pre-trained models. The pre trained models are:
66
+ - `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
67
+ - `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
68
+ - `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
69
+ - `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)
70
 
71
+ We observe the best trade-off between quality and compute with the `medium` or `melody` model.
72
+ In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
73
+ GPUs will be able to generate short sequences, or longer sequences with the `small` model.
74
 
75
+ **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
76
+ You can install it with:
77
+ ```
78
+ apt-get install ffmpeg
79
+ ```
80
 
81
+ See after a quick example for using the API.
 
82
 
83
+ ```python
84
+ import torchaudio
85
+ from audiocraft.models import MusicGen
86
+ from audiocraft.data.audio import audio_write
87
 
88
+ model = MusicGen.get_pretrained('melody')
89
+ model.set_generation_params(duration=8) # generate 8 seconds.
90
+ wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
91
+ descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
92
+ wav = model.generate(descriptions) # generates 3 samples.
93
 
94
+ melody, sr = torchaudio.load('./assets/bach.mp3')
95
+ # generates using the melody from the given audio and the provided descriptions.
96
+ wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
97
 
98
+ for idx, one_wav in enumerate(wav):
99
+ # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
100
+ audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
101
+ ```
102
 
 
103
 
104
+ ## Model Card
105
 
106
+ See [the model card page](./MODEL_CARD.md).
107
 
108
+ ## FAQ
109
 
110
+ #### Will the training code be released?
 
 
111
 
112
+ Yes. We will soon release the training code for MusicGen and EnCodec.
113
 
 
 
 
114
 
115
+ #### I need help on Windows
116
+
117
+ @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
118
 
 
119
 
120
+ ## Citation
121
  ```
122
  @article{copet2023simple,
123
+ title={Simple and Controllable Music Generation},
124
+ author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
125
+ year={2023},
126
+ journal={arXiv preprint arXiv:2306.05284},
127
  }
128
  ```
129
 
130
+ ## License
131
+ * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
132
+ * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
133
+
134
+ [arxiv]: https://arxiv.org/abs/2306.05284
135
+ [musicgen_samples]: https://ai.honu.io/papers/musicgen/
app.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Copyright (c) Meta Platforms, Inc. and affiliates.
3
+ All rights reserved.
4
+
5
+ This source code is licensed under the license found in the
6
+ LICENSE file in the root directory of this source tree.
7
+ """
8
+
9
+ from tempfile import NamedTemporaryFile
10
+ import argparse
11
+ import torch
12
+ import gradio as gr
13
+ import os
14
+ from audiocraft.models import MusicGen
15
+ from audiocraft.data.audio import audio_write
16
+
17
+ MODEL = None
18
+ IS_SHARED_SPACE = "musicgen/MusicGen" in os.environ.get('SPACE_ID', '')
19
+
20
+
21
+ def load_model(version):
22
+ print("Loading model", version)
23
+ return MusicGen.get_pretrained(version)
24
+
25
+
26
+ def predict(model, text, melody, duration, topk, topp, temperature, cfg_coef):
27
+ global MODEL
28
+ topk = int(topk)
29
+ if MODEL is None or MODEL.name != model:
30
+ MODEL = load_model(model)
31
+
32
+ if duration > MODEL.lm.cfg.dataset.segment_duration:
33
+ raise gr.Error("MusicGen currently supports durations of up to 30 seconds!")
34
+ MODEL.set_generation_params(
35
+ use_sampling=True,
36
+ top_k=topk,
37
+ top_p=topp,
38
+ temperature=temperature,
39
+ cfg_coef=cfg_coef,
40
+ duration=duration,
41
+ )
42
+
43
+ if melody:
44
+ sr, melody = melody[0], torch.from_numpy(melody[1]).to(MODEL.device).float().t().unsqueeze(0)
45
+ print(melody.shape)
46
+ if melody.dim() == 2:
47
+ melody = melody[None]
48
+ melody = melody[..., :int(sr * MODEL.lm.cfg.dataset.segment_duration)]
49
+ output = MODEL.generate_with_chroma(
50
+ descriptions=[text],
51
+ melody_wavs=melody,
52
+ melody_sample_rate=sr,
53
+ progress=False
54
+ )
55
+ else:
56
+ output = MODEL.generate(descriptions=[text], progress=False)
57
+
58
+ output = output.detach().cpu().float()[0]
59
+ with NamedTemporaryFile("wb", suffix=".wav", delete=False) as file:
60
+ audio_write(
61
+ file.name, output, MODEL.sample_rate, strategy="loudness",
62
+ loudness_headroom_db=16, loudness_compressor=True, add_suffix=False)
63
+ waveform_video = gr.make_waveform(file.name)
64
+ return waveform_video
65
+
66
+
67
+ def ui(**kwargs):
68
+ with gr.Blocks() as interface:
69
+ gr.Markdown(
70
+ """
71
+ # MusicGen
72
+ This is your private demo for [MusicGen](https://github.com/facebookresearch/audiocraft), a simple and controllable model for music generation
73
+ presented at: ["Simple and Controllable Music Generation"](https://huggingface.co/papers/2306.05284)
74
+ """
75
+ )
76
+ if IS_SHARED_SPACE:
77
+ gr.Markdown("""
78
+ ⚠ This Space doesn't work in this shared UI ⚠
79
+
80
+ <a href="https://huggingface.co/spaces/musicgen/MusicGen?duplicate=true" style="display: inline-block;margin-top: .5em;margin-right: .25em;" target="_blank">
81
+ <img style="margin-bottom: 0em;display: inline;margin-top: -.25em;" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
82
+ to use it privately, or use the <a href="https://huggingface.co/spaces/facebook/MusicGen">public demo</a>
83
+ """)
84
+ with gr.Row():
85
+ with gr.Column():
86
+ with gr.Row():
87
+ text = gr.Text(label="Input Text", interactive=True)
88
+ melody = gr.Audio(source="upload", type="numpy", label="Melody Condition (optional)", interactive=True)
89
+ with gr.Row():
90
+ submit = gr.Button("Submit")
91
+ with gr.Row():
92
+ model = gr.Radio(["melody", "medium", "small", "large"], label="Model", value="melody", interactive=True)
93
+ with gr.Row():
94
+ duration = gr.Slider(minimum=1, maximum=30, value=10, label="Duration", interactive=True)
95
+ with gr.Row():
96
+ topk = gr.Number(label="Top-k", value=250, interactive=True)
97
+ topp = gr.Number(label="Top-p", value=0, interactive=True)
98
+ temperature = gr.Number(label="Temperature", value=1.0, interactive=True)
99
+ cfg_coef = gr.Number(label="Classifier Free Guidance", value=3.0, interactive=True)
100
+ with gr.Column():
101
+ output = gr.Video(label="Generated Music")
102
+ submit.click(predict, inputs=[model, text, melody, duration, topk, topp, temperature, cfg_coef], outputs=[output])
103
+ gr.Examples(
104
+ fn=predict,
105
+ examples=[
106
+ [
107
+ "An 80s driving pop song with heavy drums and synth pads in the background",
108
+ "./assets/bach.mp3",
109
+ "melody"
110
+ ],
111
+ [
112
+ "A cheerful country song with acoustic guitars",
113
+ "./assets/bolero_ravel.mp3",
114
+ "melody"
115
+ ],
116
+ [
117
+ "90s rock song with electric guitar and heavy drums",
118
+ None,
119
+ "medium"
120
+ ],
121
+ [
122
+ "a light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions",
123
+ "./assets/bach.mp3",
124
+ "melody"
125
+ ],
126
+ [
127
+ "lofi slow bpm electro chill with organic samples",
128
+ None,
129
+ "medium",
130
+ ],
131
+ ],
132
+ inputs=[text, melody, model],
133
+ outputs=[output]
134
+ )
135
+ gr.Markdown(
136
+ """
137
+ ### More details
138
+
139
+ The model will generate a short music extract based on the description you provided.
140
+ You can generate up to 30 seconds of audio.
141
+
142
+ We present 4 model variations:
143
+ 1. Melody -- a music generation model capable of generating music condition on text and melody inputs. **Note**, you can also use text only.
144
+ 2. Small -- a 300M transformer decoder conditioned on text only.
145
+ 3. Medium -- a 1.5B transformer decoder conditioned on text only.
146
+ 4. Large -- a 3.3B transformer decoder conditioned on text only (might OOM for the longest sequences.)
147
+
148
+ When using `melody`, ou can optionaly provide a reference audio from
149
+ which a broad melody will be extracted. The model will then try to follow both the description and melody provided.
150
+
151
+ You can also use your own GPU or a Google Colab by following the instructions on our repo.
152
+ See [github.com/facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft)
153
+ for more details.
154
+ """
155
+ )
156
+
157
+ # Show the interface
158
+ launch_kwargs = {}
159
+ username = kwargs.get('username')
160
+ password = kwargs.get('password')
161
+ server_port = kwargs.get('server_port', 0)
162
+ inbrowser = kwargs.get('inbrowser', False)
163
+ share = kwargs.get('share', False)
164
+ server_name = kwargs.get('listen')
165
+
166
+ launch_kwargs['server_name'] = server_name
167
+
168
+ if username and password:
169
+ launch_kwargs['auth'] = (username, password)
170
+ if server_port > 0:
171
+ launch_kwargs['server_port'] = server_port
172
+ if inbrowser:
173
+ launch_kwargs['inbrowser'] = inbrowser
174
+ if share:
175
+ launch_kwargs['share'] = share
176
+
177
+ interface.queue().launch(**launch_kwargs, max_threads=1)
178
+
179
+
180
+ if __name__ == "__main__":
181
+ parser = argparse.ArgumentParser()
182
+ parser.add_argument(
183
+ '--listen',
184
+ type=str,
185
+ default='0.0.0.0',
186
+ help='IP to listen on for connections to Gradio',
187
+ )
188
+ parser.add_argument(
189
+ '--username', type=str, default='', help='Username for authentication'
190
+ )
191
+ parser.add_argument(
192
+ '--password', type=str, default='', help='Password for authentication'
193
+ )
194
+ parser.add_argument(
195
+ '--server_port',
196
+ type=int,
197
+ default=0,
198
+ help='Port to run the server listener on',
199
+ )
200
+ parser.add_argument(
201
+ '--inbrowser', action='store_true', help='Open in browser'
202
+ )
203
+ parser.add_argument(
204
+ '--share', action='store_true', help='Share the gradio UI'
205
+ )
206
+
207
+ args = parser.parse_args()
208
+
209
+ ui(
210
+ username=args.username,
211
+ password=args.password,
212
+ inbrowser=args.inbrowser,
213
+ server_port=args.server_port,
214
+ share=args.share,
215
+ listen=args.listen
216
+ )
app_batched.py ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Copyright (c) Meta Platforms, Inc. and affiliates.
3
+ All rights reserved.
4
+
5
+ This source code is licensed under the license found in the
6
+ LICENSE file in the root directory of this source tree.
7
+ """
8
+
9
+ import argparse
10
+ from concurrent.futures import ProcessPoolExecutor
11
+ import subprocess as sp
12
+ from tempfile import NamedTemporaryFile
13
+ import time
14
+ import warnings
15
+ import torch
16
+ import gradio as gr
17
+ from audiocraft.data.audio_utils import convert_audio
18
+ from audiocraft.data.audio import audio_write
19
+ from audiocraft.models import MusicGen
20
+
21
+
22
+ MODEL = None
23
+
24
+ _old_call = sp.call
25
+
26
+
27
+ def _call_nostderr(*args, **kwargs):
28
+ # Avoid ffmpeg vomitting on the logs.
29
+ kwargs['stderr'] = sp.DEVNULL
30
+ kwargs['stdout'] = sp.DEVNULL
31
+ _old_call(*args, **kwargs)
32
+
33
+
34
+ sp.call = _call_nostderr
35
+ pool = ProcessPoolExecutor(3)
36
+ pool.__enter__()
37
+
38
+
39
+ def make_waveform(*args, **kwargs):
40
+ be = time.time()
41
+ with warnings.catch_warnings():
42
+ warnings.simplefilter('ignore')
43
+ out = gr.make_waveform(*args, **kwargs)
44
+ print("Make a video took", time.time() - be)
45
+ return out
46
+
47
+
48
+ def load_model():
49
+ print("Loading model")
50
+ return MusicGen.get_pretrained("melody")
51
+
52
+
53
+ def predict(texts, melodies):
54
+ global MODEL
55
+ if MODEL is None:
56
+ MODEL = load_model()
57
+
58
+ duration = 12
59
+ max_text_length = 512
60
+ texts = [text[:max_text_length] for text in texts]
61
+ MODEL.set_generation_params(duration=duration)
62
+
63
+ print("new batch", len(texts), texts, [None if m is None else (m[0], m[1].shape) for m in melodies])
64
+ be = time.time()
65
+ processed_melodies = []
66
+ target_sr = 32000
67
+ target_ac = 1
68
+ for melody in melodies:
69
+ if melody is None:
70
+ processed_melodies.append(None)
71
+ else:
72
+ sr, melody = melody[0], torch.from_numpy(melody[1]).to(MODEL.device).float().t()
73
+ if melody.dim() == 1:
74
+ melody = melody[None]
75
+ melody = melody[..., :int(sr * duration)]
76
+ melody = convert_audio(melody, sr, target_sr, target_ac)
77
+ processed_melodies.append(melody)
78
+
79
+ outputs = MODEL.generate_with_chroma(
80
+ descriptions=texts,
81
+ melody_wavs=processed_melodies,
82
+ melody_sample_rate=target_sr,
83
+ progress=False
84
+ )
85
+
86
+ outputs = outputs.detach().cpu().float()
87
+ out_files = []
88
+ for output in outputs:
89
+ with NamedTemporaryFile("wb", suffix=".wav", delete=False) as file:
90
+ audio_write(
91
+ file.name, output, MODEL.sample_rate, strategy="loudness",
92
+ loudness_headroom_db=16, loudness_compressor=True, add_suffix=False)
93
+ out_files.append(pool.submit(make_waveform, file.name))
94
+ res = [[out_file.result() for out_file in out_files]]
95
+ print("batch finished", len(texts), time.time() - be)
96
+ return res
97
+
98
+
99
+ def ui(**kwargs):
100
+ with gr.Blocks() as demo:
101
+ gr.Markdown(
102
+ """
103
+ # MusicGen
104
+
105
+ This is the demo for [MusicGen](https://github.com/facebookresearch/audiocraft), a simple and controllable model for music generation
106
+ presented at: ["Simple and Controllable Music Generation"](https://huggingface.co/papers/2306.05284).
107
+ <br/>
108
+ <a href="https://huggingface.co/spaces/musicgen/MusicGen?duplicate=true" style="display: inline-block;margin-top: .5em;margin-right: .25em;" target="_blank">
109
+ <img style="margin-bottom: 0em;display: inline;margin-top: -.25em;" src="https://bit.ly/3gLdBN6" alt="Duplicate Space"></a>
110
+ for longer sequences, more control and no queue.</p>
111
+ """
112
+ )
113
+ with gr.Row():
114
+ with gr.Column():
115
+ with gr.Row():
116
+ text = gr.Text(label="Describe your music", lines=2, interactive=True)
117
+ melody = gr.Audio(source="upload", type="numpy", label="Condition on a melody (optional)", interactive=True)
118
+ with gr.Row():
119
+ submit = gr.Button("Generate")
120
+ with gr.Column():
121
+ output = gr.Video(label="Generated Music")
122
+ submit.click(predict, inputs=[text, melody], outputs=[output], batch=True, max_batch_size=8)
123
+ gr.Examples(
124
+ fn=predict,
125
+ examples=[
126
+ [
127
+ "An 80s driving pop song with heavy drums and synth pads in the background",
128
+ "./assets/bach.mp3",
129
+ ],
130
+ [
131
+ "A cheerful country song with acoustic guitars",
132
+ "./assets/bolero_ravel.mp3",
133
+ ],
134
+ [
135
+ "90s rock song with electric guitar and heavy drums",
136
+ None,
137
+ ],
138
+ [
139
+ "a light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions bpm: 130",
140
+ "./assets/bach.mp3",
141
+ ],
142
+ [
143
+ "lofi slow bpm electro chill with organic samples",
144
+ None,
145
+ ],
146
+ ],
147
+ inputs=[text, melody],
148
+ outputs=[output]
149
+ )
150
+ gr.Markdown("""
151
+ ### More details
152
+
153
+ The model will generate 12 seconds of audio based on the description you provided.
154
+ You can optionaly provide a reference audio from which a broad melody will be extracted.
155
+ The model will then try to follow both the description and melody provided.
156
+ All samples are generated with the `melody` model.
157
+
158
+ You can also use your own GPU or a Google Colab by following the instructions on our repo.
159
+
160
+ See [github.com/facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft)
161
+ for more details.
162
+ """)
163
+
164
+ # Show the interface
165
+ launch_kwargs = {}
166
+ username = kwargs.get('username')
167
+ password = kwargs.get('password')
168
+ server_port = kwargs.get('server_port', 0)
169
+ inbrowser = kwargs.get('inbrowser', False)
170
+ share = kwargs.get('share', False)
171
+ server_name = kwargs.get('listen')
172
+
173
+ launch_kwargs['server_name'] = server_name
174
+
175
+ if username and password:
176
+ launch_kwargs['auth'] = (username, password)
177
+ if server_port > 0:
178
+ launch_kwargs['server_port'] = server_port
179
+ if inbrowser:
180
+ launch_kwargs['inbrowser'] = inbrowser
181
+ if share:
182
+ launch_kwargs['share'] = share
183
+ demo.queue(max_size=8 * 4).launch(**launch_kwargs)
184
+
185
+
186
+ if __name__ == "__main__":
187
+ parser = argparse.ArgumentParser()
188
+ parser.add_argument(
189
+ '--listen',
190
+ type=str,
191
+ default='0.0.0.0',
192
+ help='IP to listen on for connections to Gradio',
193
+ )
194
+ parser.add_argument(
195
+ '--username', type=str, default='', help='Username for authentication'
196
+ )
197
+ parser.add_argument(
198
+ '--password', type=str, default='', help='Password for authentication'
199
+ )
200
+ parser.add_argument(
201
+ '--server_port',
202
+ type=int,
203
+ default=0,
204
+ help='Port to run the server listener on',
205
+ )
206
+ parser.add_argument(
207
+ '--inbrowser', action='store_true', help='Open in browser'
208
+ )
209
+ parser.add_argument(
210
+ '--share', action='store_true', help='Share the gradio UI'
211
+ )
212
+
213
+ args = parser.parse_args()
214
+
215
+ ui(
216
+ username=args.username,
217
+ password=args.password,
218
+ inbrowser=args.inbrowser,
219
+ server_port=args.server_port,
220
+ share=args.share,
221
+ listen=args.listen
222
+ )
assets/a_duck_quacking_as_birds_chirp_and_a_pigeon_cooing.mp3 DELETED
Binary file (15.2 kB)
 
assets/sirens_and_a_humming_engine_approach_and_pass.mp3 DELETED
Binary file (15.2 kB)
 
audiocraft/__init__.py CHANGED
@@ -3,24 +3,8 @@
3
  #
4
  # This source code is licensed under the license found in the
5
  # LICENSE file in the root directory of this source tree.
6
- """
7
- AudioCraft is a general framework for training audio generative models.
8
- At the moment we provide the training code for:
9
-
10
- - [MusicGen](https://arxiv.org/abs/2306.05284), a state-of-the-art
11
- text-to-music and melody+text autoregressive generative model.
12
- For the solver, see `audiocraft.solvers.musicgen.MusicGenSolver`, and for the model,
13
- `audiocraft.models.musicgen.MusicGen`.
14
- - [AudioGen](https://arxiv.org/abs/2209.15352), a state-of-the-art
15
- text-to-general-audio generative model.
16
- - [EnCodec](https://arxiv.org/abs/2210.13438), efficient and high fidelity
17
- neural audio codec which provides an excellent tokenizer for autoregressive language models.
18
- See `audiocraft.solvers.compression.CompressionSolver`, and `audiocraft.models.encodec.EncodecModel`.
19
- - [MultiBandDiffusion](TODO), alternative diffusion-based decoder compatible with EnCodec that
20
- improves the perceived quality and reduces the artifacts coming from adversarial decoders.
21
- """
22
 
23
  # flake8: noqa
24
  from . import data, modules, models
25
 
26
- __version__ = '1.1.0'
 
3
  #
4
  # This source code is licensed under the license found in the
5
  # LICENSE file in the root directory of this source tree.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  # flake8: noqa
8
  from . import data, modules, models
9
 
10
+ __version__ = '0.0.2a1'
audiocraft/adversarial/__init__.py DELETED
@@ -1,22 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
- #
4
- # This source code is licensed under the license found in the
5
- # LICENSE file in the root directory of this source tree.
6
- """Adversarial losses and discriminator architectures."""
7
-
8
- # flake8: noqa
9
- from .discriminators import (
10
- MultiPeriodDiscriminator,
11
- MultiScaleDiscriminator,
12
- MultiScaleSTFTDiscriminator
13
- )
14
- from .losses import (
15
- AdversarialLoss,
16
- AdvLossType,
17
- get_adv_criterion,
18
- get_fake_criterion,
19
- get_real_criterion,
20
- FeatLossType,
21
- FeatureMatchingLoss
22
- )