language: en
tags:
- audio
- music-generation
- sample-generation
- piano
- fine-tuning
- stable-audio
datasets:
- custom
model_name: Royal Cities Infinite Pianos - 2024 (SAO Finetune)
base_model: stabilityai/stable-audio-open-1.0
license: other
license_name: stabilityai-community-license
license_link: https://stability.ai/license
Royal Cities Infinite Pianos - 2024 (SAO Finetune)
Introduction
- Grand Piano (Native Instruments)
- Soft Electric Piano 1 (Spitfire Audio)
- Medium Electric Piano 2 (Spitfire Audio)
Furthermore, the model is capable of generating both Tremolo and Reverb effects for each output, with several different levels of control based on the prompt alone.
Model Features
- Multiple Types of Stem Generation: Outputs three primary types of piano stems: Chord Progressions only, Melodies only, and Chord Progressions combined with Melodies.
- Dynamic FX Chain: Control reverb and tremolo settings through simple text prompts, e.g., "Low Reverb" for subtle reverb, "Medium Tremolo" for moderate tremolo, "High Spacey Reverb" for an expansive reverb effect, etc.
- Tonal Versatility: Generates piano stems in any key across the 12-tone chromatic scale, in both major and minor scales.
- Simplified Scale Notation: Scales are written using sharps only in the following format:
Minor Scales A minor, A# minor, B minor, C minor, C# minor, D minor, D# minor, E minor, F minor, F# minor, G minor, G# minor Major Scales A major, A# major, B major, C major, C# major, D major, D# major, E major, F major, F# major, G major, G# major
For more details on the VSTs and gear used in sample creation, refer to the Gear section below.
Training Methodology
This model was designed to understand and generate three types of piano samples:
- Chord Progression Only Samples
- Chord Progression with Melody Samples
- Melody Only Samples
By exposing the model to various musical motifs and distinct sample differences, it has learned to distinguish between what a chord progression alone is and what would be considered a top melody line, enabling it to generate these specific types of music samples either on their own or combined based entirely on the input prompt.
Usage Guide
Supported Github Interfaces
I have designed an improved version of the existing Stable Audio Gradio that features quality of life improvements specific to music production and stem generation.
These include:
- Dynamic Model Loading: Enables dynamic model swaps of both this finetune and any future community finetune releases.
- Random Prompt Button: A one-click Random Prompt button presently tuned to this specific model's metadata - as more models are released this will also be expanded.
- BPM & Bar Selector: BPM & Bar settings tied to the model's timing conditioning, which will auto-fill any prompt with the needed BPM/Bar info. You can also lock or unlock the BPM if you wish to randomize this as well with the Random Prompt button.
- Automatic Sample to MIDI Converter: The fork will automatically convert all generated samples to .MID format, enabling users to have an infinite source of MIDI.
- Automatic Sample Trimming: The fork will automatically trim all generated samples to the exact length desired for easier importing into DAWs.
You can find a direct link to the GitHub fork here.
If you wish to use the original Stable Audio Github then you can follow this link.
You have 2 choices of model:
RC_Infinite_Pianos_2024.ckpt which is a Full 32 bit model
or
RC_Infinite_Pianos_2024_Small.ckpt which has been quantized to FP16.
To use the model simply place either .ckpt file and the config .json inside their own sub-folder inside the "models" folder and launch the gradio.
VST Support
This model has direct VST compatibility in the Audialab Engine.
Prompt Structure
To ensure the best results, use the following format for your prompts:
[Piano Type], [Modifier][Chord Progression], [Melody Type], [Key], [FX], [BPM], [Bar Count]
Chord Progression only example prompt with model output.
"Grand Piano, slow chord progression only, E minor, Medium Reverb, 110BPM, 8 bars"
Chord Progression with melody example prompt with output.
"Medium E. Piano, low chord progression with top catchy melody, B minor, High Spacey Reverb, 128BPM, 8 bars"
Melody only example prompt with model output.
"Soft E. Piano, alternating top arp melody only, F minor, High Reverb, 128BPM, 8 bars"
Python Code for Modifiers
For users not using the GitHub fork, here's the relevant randomizer code so you can mix and match to get the best use out of the model.
piano_types = ["Soft E. Piano", "Medium E. Piano", "Grand Piano"]
tremolo_effects = ["No Tremolo", "Low Tremolo", "Medium Tremolo", "High Tremolo"]
reverb_effects = ["No Reverb", "Low Reverb", "Medium Reverb", "High Reverb", "High Spacey Reverb"]
chord_progression_modifiers = ["simple", "complex", "dance plucky", "fast", "jazzy", "low", "simple strummed", "rising strummed", "complex strummed", "jazzy strummed", "slow strummed", "plucky dance", "rising", "falling", "slow", "slow jazzy", "fast jazzy", "smooth", "strummed", "plucky"]
melodies = [
"catchy melody", "complex melody", "complex top melody", "catchy top melody", "top melody", "smooth melody", "complex catchy melody",
"jazzy melody", "smooth catchy melody", "plucky dance melody", "dance melody", "alternating low melody", "alternating top arp melody", "alternating top melody", "alternating catchy melody", "top arp melody", "slow top melody", "fast top melody", "fast catchy top melody", "slow catchy top melody", "alternating melody", "falling arp melody",
"rising arp melody", "top catchy melody"
BPMs/Bars:
The BPMs ranged from as low as 100BPM up to 150BPM. The main denominations are 100BPM, 110BPM, 120BPM, 128BPM, 130BPM, 140BPM, 150BPM.
There are 2 bar settings: 4 bars and 8 bars.
Dataset Breakdown
Overview
- Total .wav files: 2030
- Total duration: 468.33 minutes
- Average Length: 13.84 seconds
- Total Size: 6.68 GB
- Sample Rate: 44100 Hz
Breakdown by Piano Type and Category
Piano Type | Chord Progression Only | Melody Only | Chord Progression with Melody |
---|---|---|---|
Grand Piano | 434 | 205 | 455 |
Medium E. Piano | 198 | 56 | 203 |
Soft E. Piano | 199 | 44 | 236 |
Technical Specifications
- Platform: Runpod
- Monitoring Tool: Weights and Biases
- Steps: 2850
- Learning Rate: 5e-5
- Optimizer: AdamW
- Scheduler: InverseLR
- Batch Size: 32
- Hardware: 2x NVIDIA A6000 GPUs
See config file for further details.
Limitations and Biases
After much testing, I've determined the model is best at Chord Progressions with Melodies, which is not too surprising as these had the most examples. It fares pretty decently at chord progressions only & finally, it is hit or miss on Melody Only generations.
As far as staying in key is concerned, it performs admirably. However, there are a few quirks and biases with certain keys. For example, if you put in C# major, it will default to a C# minor progression/melody. I believe this is due to an imbalance in the dataset between both of these keys, so the moment it sees C#, it will gravitate to the minor key.
To possibly get around this, I would suggest requesting samples in the relative minor/major. So in the case of C# major - asking for melodies in A# minor may produce at least tonally usable results (since I ensured the metadata had both the primary key and the relative major/minor as well). This will need to be improved upon with further finetuning or as I look at how to better balance the dataset.
One other thing to be aware of is the Grand Piano samples were not trained with Tremolo. This has had the adverse effect where if any prompt is generated that has a Tremolo modifier, it will often produce audio in the style of either of the Electric Pianos - even if the prompt is asking for a Grand Piano. This is something to be aware of while using the model.
Lastly, I have not tested the audio-to-audio capabilities of the model, so I cannot comment on its performance.
Gear Used
- DAW: FL Studio (Image-Line)
- Grand Piano: Alicia's Keys Kontakt Library (Native Instruments)
- Soft Electric Piano: Electric Piano DI (Spitfire Audio)
- Medium Electric Piano: Electric Piano Chorus (Spitfire Audio)
- Low & Medium Reverb VST: ValhallaRoom (ValhallaDSP)
- High Reverb: Solaris (Adam Szabo)
- Tremolo: FL Balance paired with FL Peak Controller LFO (Image-Line)
- EQ: PRO-Q3 (FabFilter)
License
This model is licensed under the Stability AI Community License. It is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M. For revenues exceeding USD $1M, please refer to the LICENSE for detailed terms.