slay-piano-gpt — a tiny char-level GPT for Irish jig generation (rendered to piano)
A 0.82M-parameter decoder-only Transformer trained from scratch on ~12k Irish jigs in ABC notation, sourced from thesession.org. It generates new 6/8 jig melodies one character at a time.
Built as a learning + research project for the Slayer collective. This is the "small LLM" half of an n-gram → mini-transformer comparison: the same next-token objective as a frontier LLM, at a scale you can train on a CPU in ~12 minutes.
Model details
- Architecture: decoder-only Transformer (GPT-style) — token + positional embeddings, causal multi-head self-attention, GELU MLP, residual + LayerNorm, weight-tied output head.
- Size: 4 layers, 4 heads,
d_model=128, context 128 chars, vocab 52 (character-level). - Parameters: 816,384.
- Tokenizer: character-level (52 ABC symbols) — no external tokenizer.
Training data
- 12,106 jigs (meter 6/8) from thesession.org, ABC notation.
- Chord-symbol annotations (
"...") stripped; ornaments (~), chords ([ace]) and accidentals kept. - ~2.45M characters, 90/10 train/val split.
- The raw dataset is not redistributed here — rebuild it with
src/prepare_data.pyfrom the thesession.org data dump, and please respect thesession.org's terms.
Training
- Objective: next-character cross-entropy.
- Optimizer: AdamW (
lr=3e-4,wd=0.1), grad-clip 1.0, warmup + cosine decay. - 2000 iterations, batch 32, block 128, CPU.
- Best validation loss: 1.335 → perplexity 3.80. Train ≈ val (no overfitting).
Usage
pip install torch music21
python src/make_midi.py --key G --n 3 --out out # -> ABC + MIDI (piano)
Or load directly:
import torch
from gpt import GPT
ck = torch.load("data/gpt_ckpt.pt", weights_only=False)
model = GPT(ck["config"]); model.load_state_dict(ck["model"]); model.eval()
# seed "X:1\nM:6/8\nK:D\n" -> model.generate(...)
Results
Compared against a character-level n-gram (order-6) baseline on the same corpus. The Transformer reaches lower perplexity and noticeably more coherent phrasing, thanks to its 128-character attention window vs. the n-gram's 6-character context. Full write-up: Slayer research blog.
Limitations (honest)
- Tiny model: captures local/phrase structure, not long-form musical form (no reliable AABB themes).
- ~128-character memory; no global planning.
- MIDI rendering is mechanical (fixed velocity, no pedal/phrasing).
- Occasional malformed repeat markers in raw output.
- Trained only on 6/8 jigs — don't expect other styles.
Acknowledgements
- Data: the thesession.org community.
- Built by Arkadiusz Słota for the Slayer collective. Educational / research project.
License
MIT (code & weights). Training data belongs to thesession.org contributors.