# MusicGen
Welcome to MusicGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use MusicGen in different settings.

First, we start by initializing MusicGen, you can choose a model from the following selection:
1. `small` - 300M transformer decoder.
2. `medium` - 1.5B transformer decoder.
3. `melody` - 1.5B transformer decoder also supporting melody conditioning.
4. `large` - 3.3B transformer decoder.

We will use the `small` variant for the purpose of this demonstration.

In [None]:
from audiocraft.models import MusicGen

# Using small model, better results would be obtained with `medium` or `large`.
model = MusicGen.get_pretrained('small')

Next, let us configure the generation parameters. Specifically, you can control the following:
* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.
* `top_k` (int, optional): top_k used for sampling. Defaults to 250.
* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.
* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.
* `duration` (float, optional): duration of the generated waveform. Defaults to 30.0.
* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.

When left unchanged, MusicGen will revert to its default parameters.

In [None]:
model.set_generation_params(
 use_sampling=True,
 top_k=250,
 duration=5
)

Next, we can go ahead and start generating music using one of the following modes:
* Unconditional samples using `model.generate_unconditional`
* Music continuation using `model.generate_continuation`
* Text-conditional samples using `model.generate`
* Melody-conditional samples using `model.generate_with_chroma`

### Unconditional Generation

In [None]:
from audiocraft.utils.notebook import display_audio

output = model.generate_unconditional(num_samples=2, progress=True)
display_audio(output, sample_rate=32000)

### Music Continuation

In [None]:
import math
import torchaudio
import torch
from audiocraft.utils.notebook import display_audio

def get_bip_bip(bip_duration=0.125, frequency=440,
 duration=0.5, sample_rate=32000, device="cuda"):
 """Generates a series of bip bip at the given frequency."""
 t = torch.arange(
 int(duration * sample_rate), device="cuda", dtype=torch.float) / sample_rate
 wav = torch.cos(2 * math.pi * 440 * t)[None]
 tp = (t % (2 * bip_duration)) / (2 * bip_duration)
 envelope = (tp >= 0.5).float()
 return wav * envelope


In [None]:
# Here we use a synthetic signal to prompt both the tonality and the BPM
# of the generated audio.
res = model.generate_continuation(
 get_bip_bip(0.125).expand(2, -1, -1), 
 32000, ['Jazz jazz and only jazz', 
 'Heartful EDM with beautiful synths and chords'], 
 progress=True)
display_audio(res, 32000)

In [None]:
# You can also use any audio from a file. Make sure to trim the file if it is too long!
prompt_waveform, prompt_sr = torchaudio.load("./assets/bach.mp3")
prompt_duration = 2
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True)
display_audio(output, sample_rate=32000)

### Text-conditional Generation

In [None]:
from audiocraft.utils.notebook import display_audio

output = model.generate(
 descriptions=[
 '80s pop track with bassy drums and synth',
 '90s rock song with loud guitars and heavy drums',
 ],
 progress=True
)
display_audio(output, sample_rate=32000)

### Melody-conditional Generation

In [None]:
import torchaudio
from audiocraft.utils.notebook import display_audio

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)

melody_waveform, sr = torchaudio.load("assets/bach.mp3")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = model.generate_with_chroma(
 descriptions=[
 '80s pop track with bassy drums and synth',
 '90s rock song with loud guitars and heavy drums',
 ],
 melody_wavs=melody_waveform,
 melody_sample_rate=sr,
 progress=True
)
display_audio(output, sample_rate=32000)