arxiv:2306.05284

Simple and Controllable Music Generation

Published on Jun 8, 2023

· Submitted by

akhaliq on Jun 9, 2023

#1 Paper of the day

Upvote

141

Authors:

Jade Copet ,

Felix Kreuk ,

Itai Gat ,

Gabriel Synnaeve ,

Alexandre Défossez

Abstract

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft.

View arXiv page View PDF Add to collection

Community

mishig

Jun 9, 2023

Space: https://huggingface.co/spaces/facebook/MusicGen

Example

Input

Describe your music: An 80s driving pop song with heavy drums and synth pads in the background

Condition on a melody (optional):

Generated Music

hao91

Jun 10, 2023

This comment has been hidden

breadlicker45

Jun 12, 2023

Cool

ShifraSec

Jun 12, 2023

with this I think it's safe to say that Meta is more Open than OpenAI/Models

breadlicker45

Jun 12, 2023

•

edited Jun 12, 2023

with this I think it's safe to say that Meta is more Open than OpenAI/Models

true

breadlicker45

Jun 12, 2023

openai knows more about music models than meta by far

farouk97

Jun 13, 2023

Very interesting paper

bishopip

Jun 15, 2023

Works https://www.youtube.com/watch?v=p3zMlq0Xjvk

Mediashock

Jun 15, 2023

Is there a way this model could be finetuned at this point?
Is there a way to finetune this using QLORA method?

francoab

Jun 16, 2023

@Mediashock You can check https://github.com/chavinlo/musicgen_trainer

sunatte

Jun 16, 2023

can anyone help me understand the different parameters? I am trying to create some music based on humming, but it doesnt do well

llamerlogs

Jun 19, 2023

does it work when i just hum a melody ? i would feel so stupid to try that and fail miserably xD

r3stless

Jun 20, 2023

can we deploy the Melody model to Aws Sagemaker?

Fandyy

Jun 23, 2023

Rap trip hop

MantraDas

Jul 29, 2023

Hey, I can't find documentation on the parameters. Can somebody link to what "top-k" does and things like that?

ShifraSec

Jul 30, 2023

@MantraDas

TLDr; higher top-k value typically makes the output more diverse and creative but might also increase its likelihood of straying from the context. lower makes it less diverse and will give you always the same answer for things (use it for customer support for example k=1).

also not advised to modify both p & k values at the same time.

Top-K, Top-p, and Temperature are all inference-time parameters that affect how tokens are generated by a LLM.

Top-K and Top-p are just sampling strategies. They aren't specific to LLMs, or even neural networks at all.

top-k or Pick from amongst the top tokens (It refers to the practice of considering only the top k most probable choices from a list of possible outcomes, where k is a predefined positive integer.)

For example, if k = 5 and the model produces the following probabilities for the next word in the sequence:

Word A: 0.3
Word B: 0.25
Word C: 0.2
Word D: 0.15
Word E: 0.1
With top-k sampling, the model would randomly select one of the top 5 words based on their probabilities. So, it might choose Word A (30% chance), Word B (25% chance), Word C (20% chance), Word D (15% chance), or Word E (10% chance) as the next word in the sequence.

Victorviro

Aug 4, 2023

How can we control the duration of the audio output with huggingface? In the audiocraft library from meta we have an option
model.set_generation_params(duration=8)
but in hugginface I don see how we can control it

MantraDas

Aug 4, 2023

You gotta clone the space to get the slider to show up. The FB one is like 30 seconds but the cloned one is up to 2 minutes or so. Maybe you are having a different tech problem but mine just has a big slider where you can set the number of seconds, after cloning the space.

20rakah

Sep 27, 2023

Tried playing around with it and it really doesn't seem to like me asking it to make something in A harmonic minor (guessing there isn't much of that in the dataset).

MantraDas

Oct 8, 2023

I tried harmonic minor too- and the app worked on it for more than 24 hours and then gave a "file not available' error. I asked it to do diminished chord too. Not sure if that messed it up also. Please train the AI on harmonic minor because it's Halloween season and gotta have harmonic minor for #spookyseason