TheArtist Music Transformer — LoRA Adapter (Gospel)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward gospel chord progressions.

This model was presented in the paper How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling.

One of eleven per-genre adapters released alongside the paper. This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

GitHub: PearlLeeStudio/TheArtist
Project Page: @StudioPearlLee

Demo

Watch TheArtist in action on YouTube — interactive staff editor, MIDI input, AI generation with live progress, and per-genre LoRA playback across the 13-genre vocabulary.

Base checkpoint note (2026-06-11): This adapter was trained on the released F1 base checkpoint, which full-SHA-256 weight-hash verification (2026-06-11) shows coincides with the pop-only Phase-0 baseline — best-checkpoint selection in the F1 run silently retained the pre-fine-tuning weights (see the Erratum on the base card). The adapter and all evaluation numbers on this card are self-consistent with that released base: every "F1 base" column was measured against the exact weights this adapter was trained on and is served with. The adaptation gains shown here are therefore gains over a pure-pop harmonic prior.

Adapter summary

Field	Value
Base model	`PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80` (F1, 25.6M params)
Adapter type	LoRA (Q/K/V projections)
LoRA rank	4
LoRA alpha	8
LoRA dropout	0.05
Target modules	`w_q`, `w_k`, `w_v`
Trainable parameters	~~106,496 (~~0.41% of base)
Adapter file size	~0.4 MB
Base vocabulary	351 tokens (jazz/pop)
Vocabulary extension	+8 genre tokens (`embedding_extension.pt`)
Training epochs	8

Training data

Source

3,747 chord-progression sequences in the gospel subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.

Filter rule

genres contains any of {gospel}

(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)

Splits (song-level, seed=42, 80/10/10)

Partition	Songs	Used for
train	2,997	this LoRA's training (12-key augmented → 35,964 sequences)
val	374	rank-sweep eval + best-epoch selection during training
test	376	held aside for future paired analysis

Vocabulary

Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:gospel] token)
Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres gospel --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/gospel_r4.yaml

Hyperparameters: 8 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Genre character

Gospel cadences, plagal motion, and gospel substitutions

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).

Rank	val_loss	val_top1 (%)	val_top5 (%)	Δtop1 vs F1
r=4	0.6092	83.20	96.63	+3.86 ← selected
r=8	0.6116	82.29	95.68	+2.95
r=16	0.6115	82.31	95.65	+2.97
r=32	0.6116	82.17	96.62	+2.83
r=64	0.6131	82.26	95.67	+2.92

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.

Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.

Evaluation

Validation token-level metrics on the genre-specific val split (374 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.

Metric	F1 base alone	F1 + this LoRA	Δ
Top-1 accuracy (%)	79.34	83.20	+3.86
Top-5 accuracy (%)	94.73	96.63	+1.90
Cross-entropy loss	0.8813	0.6092	-0.2721

Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real gospel songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.

Model	Top-1 (%)	Top-5 (%)	val_loss
F1 base alone	79.79	96.85	0.7367
F1 + this LoRA	82.85	96.64	0.6071
Δ	+3.06	-0.21	-0.1296

Evaluation data

This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:

1. Full val split — used for the rank sweep table above

Size: 374 validation sequences (this genre's val partition)
Methodology: teacher-forced next-token CE / top-1 / top-5 with pad_id masking, batch 32, no key augmentation
Comparison fairness: same evaluate() call as ai/results/f1_per_genre_baseline.csv, same dataloader, same [GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ.
Output: ai/results/lora_rank_sweep.csv (long format, one row per (genre, rank) cell)

2. Curated 130-song real-song eval — used for the Real-song eval section above

Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
Source partition: drawn from splits/val.jsonl + splits/test.jsonl only (no train leakage)
Per-genre sources: chordonomicon_gospel
Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
Bar range (this genre): 24–85 bars (≈ 98s avg at typical tempo for this genre)
Build script: ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 — deterministic, re-runnable
Output: ai/results/real_song_eval.csv (18 models × 130 songs, long format)
Full dataset composition + per-source license + methodology: see docs/EVAL.md

License and use

The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.

Usage

Both the base repo and this LoRA repo ship the project's model.py and tokenizer.py at the repo root, so external users can load this adapter end-to-end without cloning anything from GitHub. snapshot_download materializes the full repo on disk; sys.path makes the bundled model.py / tokenizer.py importable.

Required dependencies: torch, huggingface_hub, peft, safetensors.

import sys
import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from peft import PeftModel

# 1. Download the base + LoRA repos. Both bundle model.py and tokenizer.py.
base_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80")
lora_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-gospel")
sys.path.insert(0, base_dir)  # so the next two imports resolve

from model import MusicTransformer
from tokenizer import ChordTokenizer

# 2. Extended tokenizer (351 base + 8 new genre tokens = 359). The PAD id
#    is unchanged across base and extended tokenizers.
tokenizer = ChordTokenizer(include_extra_genres=True)

# 3. Build the model at the BASE vocab size (351) so F1's state_dict loads
#    cleanly; we grow the embedding rows immediately after. Passing the
#    extended tokenizer's pad_id is safe because PAD is shared (see step 2).
BASE_VOCAB = 351
model = MusicTransformer(
    vocab_size=BASE_VOCAB,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
ckpt = torch.load(f"{base_dir}/best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])

# 4. Grow token_emb + out_proj from 351 -> 359 (new rows init from
#    [GENRE:none]), then overlay the LoRA's trained extension rows.
def _grow_to_extended_vocab(m, new_vocab, none_id):
    d = m.token_emb.embedding_dim
    new_emb = nn.Embedding(new_vocab, d, padding_idx=m.token_emb.padding_idx)
    with torch.no_grad():
        new_emb.weight[:m.token_emb.num_embeddings] = m.token_emb.weight
        for i in range(m.token_emb.num_embeddings, new_vocab):
            new_emb.weight[i] = m.token_emb.weight[none_id]
    m.token_emb = new_emb
    new_out = nn.Linear(d, new_vocab, bias=False)
    with torch.no_grad():
        new_out.weight[:m.out_proj.out_features] = m.out_proj.weight
        for i in range(m.out_proj.out_features, new_vocab):
            new_out.weight[i] = m.out_proj.weight[none_id]
    m.out_proj = new_out

_grow_to_extended_vocab(model, tokenizer.vocab_size, tokenizer.encode_genre("none"))

ext = torch.load(f"{lora_dir}/embedding_extension.pt",
                 map_location="cpu", weights_only=False)
model.token_emb.load_state_dict(ext["token_emb_state"])
model.out_proj.load_state_dict(ext["out_proj_state"])

# 5. Apply the LoRA adapter (the adapter files live at lora_dir/adapter/).
model = PeftModel.from_pretrained(model, f"{lora_dir}/adapter")
model.eval()

# 6. Generate a gospel continuation. With LoRA injected,
#    PeftModel.forward routes through the adapted attention layers.
song = {
    "key": "Cmaj", "time_signature": "4/4", "genre": "gospel",
    "bars": [["Cmaj7"], ["Fmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)  # routed through LoRA via PeftModel
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

Citation

Cite both the v1 mix-ratio paper (F1 base) and the v2 per-genre LoRA paper (this adapter family):

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

@misc{lee2026chordtimeseries,
  title         = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2606.07334},
  archivePrefix = {arXiv}
}

Downloads last month: 106

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-gospel

Base model

PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80

Adapter

(11)

this model

Papers for PearlLeeStudio/TheArtist-MusicTransformer-lora-gospel

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Paper • 2606.07334 • Published 10 days ago

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published May 6