nateraw
/

musicgen-songstarter-v0.2

Text-to-Audio

Audiocraft

English

musicgen

songstarter

Model card Files Files and versions Community

nateraw commited on May 3

Commit

046d174

•

1 Parent(s): 367b2c6

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -1

README.md CHANGED Viewed

@@ -15,6 +15,8 @@ license: cc-by-nc-4.0
 musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
 Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
 - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
 - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM
@@ -58,7 +60,7 @@ for idx, one_wav in enumerate(wav):
 Follow the following prompt format:
 ```
-{tag_1}, {tag_1}, ..., {tag_n}, {key}, {bpm} bpm
 ```
 For example:
@@ -67,6 +69,8 @@ For example:
 hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
 ```
 ## Samples
 <table style="width:100%; text-align:center;">
@@ -111,6 +115,23 @@ hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
   </tr>
 </table>
 ## Acknowledgements
 This work would not have been possible without:

 musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.
+**👀 Update:** I wrote a [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html) detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.
 Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version:
 - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice
 - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM
 Follow the following prompt format:
 ```
+{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm
 ```
 For example:
 hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm
 ```
+For some example tags, [see the prompt format section of musicgen-songstarter-v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt-format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.
 ## Samples
 <table style="width:100%; text-align:center;">
   </tr>
 </table>
+## Training Details
+For more verbose details, you can check out the [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html#training).
+- **code**:
+  - Repo is [here](https://github.com/nateraw/audiocraft). It's an undocumented fork of [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft) where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
+- **data**:
+  - around 1700-1800 samples I manually listened to + purchased via my personal [Splice](https://splice.com) account. About 7-8 hours of audio.
+  - Given the licensing terms, I cannot share the data.
+- **hardware**:
+  - 8xA100 40GB instance from Lambda Labs
+- **procedure**:
+  - trained for 10k steps, which took about 6 hours
+  - reduced segment duration at train time to 15 seconds
+- **hparams/logs**:
+  - See the wandb [run](https://wandb.ai/nateraw/musicgen-songstarter-v0.2/runs/63gh4l7m), which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.
 ## Acknowledgements
 This work would not have been possible without: