nateraw commited on
Commit
046d174
1 Parent(s): 367b2c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -15,6 +15,8 @@ license: cc-by-nc-4.0
15
 
16
  musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
17
 
 
 
18
  Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
19
  - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
20
  - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM
@@ -58,7 +60,7 @@ for idx, one_wav in enumerate(wav):
58
  Follow the following prompt format:
59
 
60
  ```
61
- {tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm
62
  ```
63
 
64
  For example:
@@ -67,6 +69,8 @@ For example:
67
  hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
68
  ```
69
 
 
 
70
  ## Samples
71
 
72
  <table style="width:100%; text-align:center;">
@@ -111,6 +115,23 @@ hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
111
  </tr>
112
  </table>
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ## Acknowledgements
115
 
116
  This work would not have been possible without:
 
15
 
16
  musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
17
 
18
+ **👀 Update:** I wrote a [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html) detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.
19
+
20
  Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
21
  - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
22
  - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM
 
60
  Follow the following prompt format:
61
 
62
  ```
63
+ {tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
64
  ```
65
 
66
  For example:
 
69
  hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
70
  ```
71
 
72
+ For some example tags, [see the prompt format section of musicgen-songstarter-v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt-format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.
73
+
74
  ## Samples
75
 
76
  <table style="width:100%; text-align:center;">
 
115
  </tr>
116
  </table>
117
 
118
+ ## Training Details
119
+
120
+ For more verbose details, you can check out the [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html#training).
121
+
122
+ - **code**:
123
+ - Repo is [here](https://github.com/nateraw/audiocraft). It's an undocumented fork of [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft) where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
124
+ - **data**:
125
+ - around 1700-1800 samples I manually listened to + purchased via my personal [Splice](https://splice.com) account. About 7-8 hours of audio.
126
+ - Given the licensing terms, I cannot share the data.
127
+ - **hardware**:
128
+ - 8xA100 40GB instance from Lambda Labs
129
+ - **procedure**:
130
+ - trained for 10k steps, which took about 6 hours
131
+ - reduced segment duration at train time to 15 seconds
132
+ - **hparams/logs**:
133
+ - See the wandb [run](https://wandb.ai/nateraw/musicgen-songstarter-v0.2/runs/63gh4l7m), which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.
134
+
135
  ## Acknowledgements
136
 
137
  This work would not have been possible without: