File size: 12,525 Bytes
efca622 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
---
language: en
tags:
- audio
- music-generation
- sample-generation
- piano
- fine-tuning
- stable-audio
datasets:
- custom
model_name: Royal Cities Infinite Pianos - 2024 (SAO Finetune)
base_model: stabilityai/stable-audio-open-1.0
license: other
license_name: stabilityai-community-license
license_link: https://stability.ai/license
---
<center><img src="https://i.imgur.com/NH3EpPz.jpeg" alt="Header Logo" width="100%"></center>
<center>
<h2 style="font-size: 30px;"><u>Royal Cities Infinite Pianos - 2024 (SAO Finetune)</u></h2>
</center>
<center>
<h2 style="font-size: 19px;">Introduction</h2>
</center>
This finetuned Stable Audio Open model specializes in generating piano samples to support granular music production workflows. Capable of creating an infinite variety of piano compositions, all output is BPM-synced and key-locked to any note within the 12-tone chromatic scale, in both major and minor keys. Developed over several weeks, this model was trained on a custom dataset crafted within FL Studio and features three distinct piano types:
- **Grand Piano** (Native Instruments)
- **Soft Electric Piano 1** (Spitfire Audio)
- **Medium Electric Piano 2** (Spitfire Audio)
Furthermore, <b>the model is capable of generating both Tremolo and Reverb effects for each output,</b> with <i>several</i> different levels of control based on the prompt alone.
<center>
<h2 style="font-size: 19px;">Model Features</h2>
</center>
- **Multiple Types of Stem Generation:** Outputs three primary types of piano stems: Chord Progressions only, Melodies only, and Chord Progressions combined with Melodies.
- **Dynamic FX Chain:** Control reverb and tremolo settings through simple text prompts, e.g., "Low Reverb" for subtle reverb, "Medium Tremolo" for moderate tremolo, "High Spacey Reverb" for an expansive reverb effect, etc.
- **Tonal Versatility:** Generates piano stems in any key across the 12-tone chromatic scale, in both major and minor scales.
- **Simplified Scale Notation:** Scales are written using <b><i>sharps only</i></b> in the following format:
<pre>
<b>Minor Scales</b>
A minor, A# minor, B minor, C minor, C# minor, D minor, D# minor,
E minor, F minor, F# minor, G minor, G# minor
<b>Major Scales</b>
A major, A# major, B major, C major, C# major, D major, D# major,
E major, F major, F# major, G major, G# major
</pre>
For more details on the VSTs and gear used in sample creation, refer to the Gear section below.
<center>
<h2 style="font-size: 19px;">Training Methodology</h2>
</center>
This model was designed to understand and generate three types of piano samples:
1. **Chord Progression Only Samples**
2. **Chord Progression with Melody Samples**
3. **Melody Only Samples**
By exposing the model to various musical motifs and distinct sample differences, it has learned to distinguish between what a chord progression alone is and what would be considered a top melody line, enabling it to generate these specific types of music samples either on their own or combined based entirely on the input prompt.
<center>
<h2 style="font-size: 24px;"><u>Usage Guide</u></h2>
</center>
<center>
<h2 style="font-size: 19px;">Supported Github Interfaces</h2>
</center>
I have designed an improved version of the existing Stable Audio Gradio that features quality of life improvements specific to music production and stem generation.
These include:
- **Dynamic Model Loading**: Enables dynamic model swaps of both this finetune and any future community finetune releases.
<center><img src="https://i.imgur.com/kB8CQ3J.gif" alt="Model Loader Gif" width="50%"></center>
- **Random Prompt Button**: A one-click Random Prompt button presently tuned to this specific model's metadata - as more models are released this will also be expanded.
<center><img src="https://i.imgur.com/fNEE8cR.gif" alt="Random Prompt Button Gif" width="80%"></center>
- **BPM & Bar Selector**: BPM & Bar settings tied to the model's timing conditioning, which will auto-fill any prompt with the needed BPM/Bar info. You can also lock or unlock the BPM if you wish to randomize this as well with the Random Prompt button.
<center><img src="https://i.imgur.com/hcedPl5.png" alt="BPM and Bar Example Gif" width="50%"></center>
- **Automatic Sample to MIDI Converter**: The fork will automatically convert all generated samples to .MID format, enabling users to have an infinite source of MIDI.
<center><img src="https://i.imgur.com/R9ipGiq.gif" alt="Midi Converter Example Gif" width="50%"></center>
- **Automatic Sample Trimming**: The fork will automatically trim all generated samples to the exact length desired for easier importing into DAWs.
<center><img src="https://i.imgur.com/ApH5SOM.gif" alt="Midi Converter Example Gif" width="75%"></center>
<b>
<p align="center" style="font-size: 20px;">
You can find a <a href="https://github.com/RoyalCities/RC-stable-audio-tools" style="font-size: 20px;">direct link to the GitHub fork here.</a>
</p>
<p align="center" style="font-size: 20px;">
If you wish to use the original Stable Audio project then you can <a href="https://github.com/Stability-AI/stable-audio-tools" style="font-size: 20px;">follow this link.</a>
</p>
</b>
<center>
<h2 style="font-size: 24px;"><u>Prompt Structure</u></h2>
</center>
To ensure the best results, use the following format for your prompts:
<pre><b>
[Piano Type], [Modifier][Chord Progression], [Melody Type], [Key], [FX], [BPM], [Bar Count]
</b></pre>
#### Chord Progression only example prompt with model output.
"Grand Piano, slow chord progression only, E minor, Medium Reverb, 110BPM, 8 bars"
<audio controls src="https://huggingface.co/RoyalCities/RC_Infinite_Pianos/resolve/main/example_1.mp3"></audio>
#### Chord Progression with melody example prompt with output.
"Medium E. Piano, low chord progression with top catchy melody, B minor, High Spacey Reverb, 128BPM, 8 bars"
<audio controls src="https://huggingface.co/RoyalCities/RC_Infinite_Pianos/resolve/main/example_2.mp3"></audio>
#### Melody only example prompt with model output.
"Soft E. Piano, alternating top arp melody only, F minor, High Reverb, 128BPM, 8 bars"
<audio controls src="https://huggingface.co/RoyalCities/RC_Infinite_Pianos/resolve/main/example_3.mp3"></audio>
### Python Code for Modifiers
For users not using the GitHub fork, here's the relevant randomizer code so you can mix and match to get the best use out of the model.
```python
piano_types = ["Soft E. Piano", "Medium E. Piano", "Grand Piano"]
tremolo_effects = ["No Tremolo", "Low Tremolo", "Medium Tremolo", "High Tremolo"]
reverb_effects = ["No Reverb", "Low Reverb", "Medium Reverb", "High Reverb", "High Spacey Reverb"]
chord_progression_modifiers = ["simple", "complex", "dance plucky", "fast", "jazzy", "low", "simple strummed", "rising strummed", "complex strummed", "jazzy strummed", "slow strummed", "plucky dance", "rising", "falling", "slow", "slow jazzy", "fast jazzy", "smooth", "strummed", "plucky"]
melodies = [
"catchy melody", "complex melody", "complex top melody", "catchy top melody", "top melody", "smooth melody", "complex catchy melody",
"jazzy melody", "smooth catchy melody", "plucky dance melody", "dance melody", "alternating low melody", "alternating top arp melody", "alternating top melody", "alternating catchy melody", "top arp melody", "slow top melody", "fast top melody", "fast catchy top melody", "slow catchy top melody", "alternating melody", "falling arp melody",
"rising arp melody", "top catchy melody"
```
#### BPMs/Bars:
The BPMs ranged from as low as 100BPM up to 150BPM. The main denominations are **100BPM, 110BPM, 120BPM, 128BPM, 130BPM, 140BPM, 150BPM**.
There are 2 bar settings: **4 bars** and **8 bars**.
<center>
<h2 style="font-size: 24px;"><u>Dataset Breakdown</u></h2>
</center>
<center>
<h2 style="font-size: 19px;">Overview</h2>
</center>
- **Total .wav files**: 2030
- **Total duration**: 468.33 minutes
- **Average Length**: 13.84 seconds
- **Total Size**: 6.68 GB
- **Sample Rate**: 44100 Hz
<center>
<h2 style="font-size: 19px;">Breakdown by Piano Type and Category</h2>
</center>
<table align="center" style="width: 80%; border-collapse: collapse;">
<thead>
<tr>
<th style="border: 1px solid black; padding: 8px;">Piano Type</th>
<th style="border: 1px solid black; padding: 8px;">Chord Progression Only</th>
<th style="border: 1px solid black; padding: 8px;">Melody Only</th>
<th style="border: 1px solid black; padding: 8px;">Chord Progression with Melody</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Grand Piano</b></td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">434</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">205</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">455</td>
</tr>
<tr>
<td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Medium E. Piano</b></td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">198</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">56</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">203</td>
</tr>
<tr>
<td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Soft E. Piano</b></td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">199</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">44</td>
<td style="border: 1px solid black; padding: 8px; text-align: center;">236</td>
</tr>
</tbody>
</table>
<center>
<h2 style="font-size: 19px;">Technical Specifications</h2>
</center>
- **Platform**: Runpod
- **Monitoring Tool**: Weights and Biases
- **Steps**: 2850
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **Scheduler**: InverseLR
- **Batch Size**: 32
- **Hardware**: 2x NVIDIA A6000 GPUs
See config file for further details.
<center>
<h2 style="font-size: 24px;"><u>Limitations and Biases</u></h2>
</center>
After much testing, I've determined the model is best at Chord Progressions with Melodies, which is not too surprising as these had the most examples. It fares pretty decently at chord progressions only & finally, it is hit or miss on Melody Only generations.
As far as staying in key is concerned, it performs admirably. However, there are a few quirks and biases with certain keys. For example, if you put in C# major, it will default to a C# minor progression/melody. I believe this is due to an imbalance in the dataset between both of these keys, so the moment it sees C#, it will gravitate to the minor key.
To possibly get around this, I would suggest requesting samples in the relative minor/major. So in the case of C# major - asking for melodies in A# minor may produce at least tonally usable results (since I ensured the metadata had both the primary key and the relative major/minor as well). This will need to be improved upon with further finetuning or as I look at how to better balance the dataset.
One other thing to be aware of is the Grand Piano samples were not trained with Tremolo. This has had the adverse effect where if any prompt is generated that has a Tremolo modifier, it will often produce audio in the style of either of the Electric Pianos - even if the prompt is asking for a Grand Piano. This is something to be aware of while using the model.
Lastly, I have not tested the audio-to-audio capabilities of the model, so I cannot comment on its performance.
<center>
<h2 style="font-size: 24px;"><u>Gear Used</u></h2>
</center>
- **DAW:** FL Studio (Image-Line)
- **Grand Piano:** Alicia's Keys Kontakt Library (Native Instruments)
- **Soft Electric Piano:** Electric Piano DI (Spitfire Audio)
- **Medium Electric Piano:** Electric Piano Chorus (Spitfire Audio)
- **Low & Medium Reverb VST:** ValhallaRoom (ValhallaDSP)
- **High Reverb:** Solaris (Adam Szabo)
- **Tremolo:** FL Balance paired with FL Peak Controller LFO (Image-Line)
- **EQ:** PRO-Q3 (FabFilter)
<center>
<h2 style="font-size: 24px;"><u>License</u></h2>
</center>
This model is licensed under the Stability AI Community License. It is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M. For revenues exceeding USD $1M, please refer to the [LICENSE](./LICENSE.md) for detailed terms.
|