Update README.md

046d174 verified 6 months ago

6.08 kB

	---
	pipeline_tag: text-to-audio
	library_name: audiocraft
	language: en
	tags:
	- text-to-audio
	- musicgen
	- songstarter
	license: cc-by-nc-4.0
	---

	# Model Card for musicgen-songstarter-v0.2

	[![Replicate demo and cloud API](https://replicate.com/nateraw/musicgen-songstarter-v0.2/badge)](https://replicate.com/nateraw/musicgen-songstarter-v0.2) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/nateraw/singing-songstarter)

	musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.

	👀 Update: I wrote a [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html) detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.

	Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
	- was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
	- Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM

	If you find this model interesting, please consider:
	- following me on [GitHub](https://github.com/nateraw)
	- following me on [Twitter](https://twitter.com/_nateraw)

	## Usage

	Install [audiocraft](https://github.com/facebookresearch/audiocraft):

	```
	pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
	```

	Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:

	```python
	import torchaudio
	from audiocraft.models import MusicGen
	from audiocraft.data.audio import audio_write

	model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
	model.set_generation_params(duration=8) # generate 8 seconds.
	wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
	descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
	wav = model.generate(descriptions) # generates 3 samples.

	melody, sr = torchaudio.load('./assets/bach.mp3')
	# generates using the melody from the given audio and the provided descriptions.
	wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

	for idx, one_wav in enumerate(wav):
	# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
	audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
	```

	## Prompt Format

	Follow the following prompt format:

	```
	{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
	```

	For example:

	```
	hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
	```

	For some example tags, [see the prompt format section of musicgen-songstarter-v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt-format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.

	## Samples

	<table style="width:100%; text-align:center;">
	<tr>
	<th>Audio Prompt</th>
	<th>Text Prompt</th>
	<th>Output</th>
	</tr>
	<tr>
	<td>
	<audio controls>
	<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho.wav?download=true" type="audio/wav">
	Your browser does not support the audio element.
	</audio>
	</td>
	<td>
	trap, synthesizer, songstarters, dark, G# minor, 140 bpm
	</td>
	<td>
	<audio controls>
	<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho_trap.wav?download=true" type="audio/wav">
	Your browser does not support the audio element.
	</audio>
	</td>
	</tr>
	<tr>
	<td>
	<audio controls>
	<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach.mp3?download=true" type="audio/mp3">
	Your browser does not support the audio element.
	</audio>
	</td>
	<td>
	acoustic, guitar, melody, trap, D minor, 90 bpm
	</td>
	<td>
	<audio controls>
	<source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach_guitar.wav?download=true" type="audio/wav">
	Your browser does not support the audio element.
	</audio>
	</td>
	</tr>
	</table>

	## Training Details

	For more verbose details, you can check out the [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html#training).

	- code:
	- Repo is [here](https://github.com/nateraw/audiocraft). It's an undocumented fork of [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft) where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
	- data:
	- around 1700-1800 samples I manually listened to + purchased via my personal [Splice](https://splice.com) account. About 7-8 hours of audio.
	- Given the licensing terms, I cannot share the data.
	- hardware:
	- 8xA100 40GB instance from Lambda Labs
	- procedure:
	- trained for 10k steps, which took about 6 hours
	- reduced segment duration at train time to 15 seconds
	- hparams/logs:
	- See the wandb [run](https://wandb.ai/nateraw/musicgen-songstarter-v0.2/runs/63gh4l7m), which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.

	## Acknowledgements

	This work would not have been possible without:

	- [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits
	- [Replicate](https://replicate.com/), for early development compute resources

	Thank you ❤️