--- pipeline_tag: text-to-audio library_name: audiocraft language: en tags: - text-to-audio - musicgen - songstarter license: cc-by-nc-4.0 --- # Model Card for musicgen-songstarter-v0.2 [![Replicate demo and cloud API](https://replicate.com/nateraw/musicgen-songstarter-v0.2/badge)](https://replicate.com/nateraw/musicgen-songstarter-v0.2) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/nateraw/singing-songstarter) musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz. Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version: - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM If you find this model interesting, please consider: - following me on [GitHub](https://github.com/nateraw) - following me on [Twitter](https://twitter.com/_nateraw) ## Usage Install [audiocraft](https://github.com/facebookresearch/audiocraft): ``` pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft ``` Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub: ```python import torchaudio from audiocraft.models import MusicGen from audiocraft.data.audio import audio_write model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2') model.set_generation_params(duration=8) # generate 8 seconds. wav = model.generate_unconditional(4) # generates 4 unconditional audio samples descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3 wav = model.generate(descriptions) # generates 3 samples. melody, sr = torchaudio.load('./assets/bach.mp3') # generates using the melody from the given audio and the provided descriptions. wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr) for idx, one_wav in enumerate(wav): # Will save under {idx}.wav, with loudness normalization at -14 db LUFS. audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True) ``` ## Prompt Format Follow the following prompt format: ``` {tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm ``` For example: ``` hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm ``` ## Samples
Audio Prompt Text Prompt Output
trap, synthesizer, songstarters, dark, G# minor, 140 bpm
acoustic, guitar, melody, trap, D minor, 90 bpm
## Acknowledgements This work would not have been possible without: - [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits - [Replicate](https://replicate.com/), for early development compute resources Thank you ❤️