File size: 4,518 Bytes
c138056
 
 
 
 
 
 
 
 
 
 
 
 
367b2c6
72ce004
c138056
 
d49eed0
 
 
 
 
 
163813b
f3dce0c
c138056
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a778c94
 
6961a9f
 
 
 
 
 
 
 
 
 
 
 
 
 
a778c94
 
383670b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3768190
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
pipeline_tag: text-to-audio
library_name: audiocraft
language: en
tags:
- text-to-audio
- musicgen
- songstarter
license: cc-by-nc-4.0
---

# Model Card for musicgen-songstarter-v0.2

[![Replicate demo and cloud API](https://replicate.com/nateraw/musicgen-songstarter-v0.2/badge)](https://replicate.com/nateraw/musicgen-songstarter-v0.2) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/nateraw/singing-songstarter)

musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.

Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
- was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
- Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM

If you find this model interesting, please consider:
  - following me on [GitHub](https://github.com/nateraw)
  - following me on [Twitter](https://twitter.com/_nateraw)

## Usage

Install [audiocraft](https://github.com/facebookresearch/audiocraft):

```
pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
```

Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:

```python
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
```

## Prompt Format

Follow the following prompt format:

```
{tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm
```

For example:

```
hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
```

## Samples

<table style="width:100%; text-align:center;">
  <tr>
    <th>Audio Prompt</th>
    <th>Text Prompt</th>
    <th>Output</th>
  </tr>
  <tr>
    <td>
      <audio controls>
        <source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho.wav?download=true" type="audio/wav">
        Your browser does not support the audio element.
      </audio>
    </td>
    <td>
      trap, synthesizer, songstarters, dark, G# minor, 140 bpm
    </td>
    <td>
      <audio controls>
        <source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/kalhonaho_trap.wav?download=true" type="audio/wav">
        Your browser does not support the audio element.
      </audio>
    </td>
  </tr>
  <tr>
    <td>
      <audio controls>
        <source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach.mp3?download=true" type="audio/mp3">
        Your browser does not support the audio element.
      </audio>
    </td>
    <td>
      acoustic, guitar, melody, trap, D minor, 90 bpm
    </td>
    <td>
      <audio controls>
        <source src="https://huggingface.co/nateraw/musicgen-songstarter-v0.2/resolve/main/assets/bach_guitar.wav?download=true" type="audio/wav">
        Your browser does not support the audio element.
      </audio>
    </td>
  </tr>
</table>

## Acknowledgements

This work would not have been possible without:

- [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits
- [Replicate](https://replicate.com/), for early development compute resources

Thank you ❤️