Simple and Controllable Music Generation
We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft.
with this I think it's safe to say that Meta is more Open than OpenAI/Models
openai knows more about music models than meta by far
Very interesting paper
Is there a way this model could be finetuned at this point?
Is there a way to finetune this using QLORA method?
can anyone help me understand the different parameters? I am trying to create some music based on humming, but it doesnt do well
does it work when i just hum a melody ? i would feel so stupid to try that and fail miserably xD
can we deploy the Melody model to Aws Sagemaker?
Rap trip hop
Hey, I can't find documentation on the parameters. Can somebody link to what "top-k" does and things like that?
TLDr; higher top-k value typically makes the output more diverse and creative but might also increase its likelihood of straying from the context. lower makes it less diverse and will give you always the same answer for things (use it for customer support for example k=1).
- also not advised to modify both p & k values at the same time.
Top-K, Top-p, and Temperature are all inference-time parameters that affect how tokens are generated by a LLM.
Top-K and Top-p are just sampling strategies. They aren't specific to LLMs, or even neural networks at all.
top-k or Pick from amongst the top tokens (It refers to the practice of considering only the top k most probable choices from a list of possible outcomes, where k is a predefined positive integer.)
For example, if k = 5 and the model produces the following probabilities for the next word in the sequence:
Word A: 0.3
Word B: 0.25
Word C: 0.2
Word D: 0.15
Word E: 0.1
With top-k sampling, the model would randomly select one of the top 5 words based on their probabilities. So, it might choose Word A (30% chance), Word B (25% chance), Word C (20% chance), Word D (15% chance), or Word E (10% chance) as the next word in the sequence.
How can we control the duration of the audio output with huggingface? In the audiocraft library from meta we have an option
but in hugginface I don see how we can control it
You gotta clone the space to get the slider to show up. The FB one is like 30 seconds but the cloned one is up to 2 minutes or so. Maybe you are having a different tech problem but mine just has a big slider where you can set the number of seconds, after cloning the space.
Tried playing around with it and it really doesn't seem to like me asking it to make something in A harmonic minor (guessing there isn't much of that in the dataset).
I tried harmonic minor too- and the app worked on it for more than 24 hours and then gave a "file not available' error. I asked it to do diminished chord too. Not sure if that messed it up also. Please train the AI on harmonic minor because it's Halloween season and gotta have harmonic minor for #spookyseason
Hi, Can i take this model and train it on custom set of music?
Models citing this paper 45Browse 45 models citing this paper
Datasets citing this paper 0
No dataset linking this paper