Expressive Jazz Piano Performance Modeling

Trained checkpoints for jazz piano performance modeling, covering two model families: two pretrained PerformanceRNN LSTM variants and two fine-tunes of the publicly released Aria 1B-parameter piano language model (Bradshaw & Colton 2025). All models are fine-tuned/trained on PiJAMA (Edwards et al. 2024). The code repository is https://github.com/napanto/jazz-piano-performance-modeling. The Aria fine-tunes also drive a real-time macOS application: https://github.com/napanto/aria-realtime-studio.

Models in this repo

The model names below match the headline table of the report.

aria-full-quality/ β€” Aria fine-tune, full-quality (offline)

Aria 1B-parameter LLaMA-3.2-style decoder fine-tuned on the PiJAMA hawthorne split with the default 17 727-id AbsTokenizer (no sustain pedal). Architecture: medium (d=1536, 16 layers, 24 heads, RoPE, GQA, max_seq_len=8192). Best swept mean OA on the test split: 0.911. FMD vs the kong test-pool reference (CLaMP-2 encoder): 272.6.

  • tested.safetensors β€” the checkpoint reported on in the paper (4-stage train/val pipeline: retrained on TRAIN+VAL for the patience-selected epoch count, evaluated once on TEST).
  • deployed.safetensors β€” full retrain on TRAIN+VAL+TEST for the same epoch count. For deployment / listening only; test metrics are not honestly reportable on this checkpoint because it has seen the test set.

aria-real-time/ β€” Aria fine-tune, real-time MLX-compatible

Same backbone but loaded from the public model-demo.safetensors checkpoint with the residual-stream embedding-projection layer preserved (medium-emb architecture, +1536Γ—512 emb_proj). Trained on PiJAMA kong album-aware split with the 2 675-id demo tokenizer that adds explicit sustain-pedal events. Drop-in jazz replacement for the upstream aria/demo/demo_mlx.py iOS sampler. Best swept mean OA: 0.804. FMD: 233.6.

  • tested.safetensors, deployed.safetensors β€” same conventions as above.

lstm-hawthorne/ β€” pretrained PerformanceRNN, hawthorne split

3-layer stacked LSTM (hidden 512, embed 512, tied I/O head, 6.46M params; paper-faithful PerformanceRNN, Oore et al. 2018). Pretrained on the 820 944-file Aria-MIDI corpus (30k steps) and fine-tuned on the PiJAMA hawthorne split with the 413-id no-pedal vocabulary. Best swept mean OA: 0.664. FMD: 427.8.

  • tested.pt β€” Stage-B equivalent (the one reported on).

lstm-kong-pedal/ β€” pretrained PerformanceRNN, kong+pedal split

Same architecture but with the 314-id pedal-aware vocabulary (NOTE_ONΓ—88 + NOTE_OFFΓ—88 + TIME_SHIFTΓ—100 + VELOCITYΓ—32 + SUSTAIN_ON/OFF + 4 specials). Fine-tuned on the PiJAMA kong album-aware split. Best swept mean OA: 0.768 (only β‰ˆ0.04 below Aria real-time despite a ~150Γ— parameter ratio). FMD: 438.6.

  • tested.pt

Note: a full retrain on TRAIN+VAL+TEST was not performed for the LSTMs (their compute cost is small enough that the 4-stage generalisation-honest pipeline already gives a strong deployment baseline). If you need that variant, the training script in the code repository reproduces it in β‰ˆ25 minutes on a single NVIDIA B200 (or comparable GPU).

MLX variants for macOS inference

Each Aria model also has mlx-tested/ and mlx-deployed/ directories containing:

  • model.safetensors β€” same weights as the top-level safetensors, laid out for loading via mlx.core.load() on Apple silicon.
  • config.json β€” the corresponding Aria model config (medium.json for full-quality, medium-emb.json for real-time).
  • For aria-real-time/mlx-* only: tokenizer-config.json, the same 2 675-id demo tokenizer the upstream aria/demo/demo_mlx.py uses.

Running on macOS

aria-real-time/mlx-tested/ is a drop-in replacement for the weights expected by the upstream aria/demo/demo_mlx.py (iOS / Apple silicon real-time sampler from EleutherAI/aria). Point that script at model.safetensors and use the bundled tokenizer-config.json:

python aria/demo/demo_mlx.py \
    --checkpoint-path /path/to/mlx-tested/model.safetensors \
    --tokenizer-config /path/to/mlx-tested/tokenizer-config.json

aria-full-quality/mlx-*/ ships the full-quality weights and the medium.json config. The upstream demo_mlx.py hardcodes the medium-emb arch, so to run these checkpoints on MLX you either:

  1. Adapt aria.inference.model_mlx.TransformerLM to load medium instead of medium-emb (drop the emb_proj layer), or
  2. Run inference via PyTorch with the MPS backend on macOS, using the top-level tested.safetensors / deployed.safetensors and the default AbsTokenizer (no demo tokenizer config needed).

The full-quality checkpoints are β‰ˆ2.5 GB in bf16 β€” they fit easily on β‰₯16 GB unified-memory Apple silicon for inference.

Loading from Python

Aria (any variant) on CUDA / ROCm / MPS

from aria.config import load_model_config
from aria.model import ModelConfig, TransformerLM
from safetensors.torch import load_file

model_config = ModelConfig(**load_model_config("medium"))      # or "medium-emb"
model_config.set_vocab_size(17727)                              # or 2675 for real-time
model = TransformerLM(model_config)
model.load_state_dict(load_file("tested.safetensors"), strict=False)
model.eval()

Aria on MLX (Apple silicon)

import mlx.core as mx
weights = mx.load("mlx-tested/model.safetensors")
# … then build the MLX TransformerLM as in aria.inference.model_mlx

LSTM

import torch
from src.models.performancernn_lstm import PerformanceRNNLSTM, PerformanceRNNLSTMConfig
ckpt = torch.load("tested.pt", map_location="cpu", weights_only=False)
cfg  = PerformanceRNNLSTMConfig(**ckpt["config"])
model = PerformanceRNNLSTM(cfg)
model.load_state_dict(ckpt["model_state"], strict=True)
model.eval()

(The PerformanceRNNLSTM / PerformanceRNNLSTMConfig definitions live in the code repository under src/models/performancernn_lstm.py.)

Recommended sampling settings

For the four autoregressive models the Stage-C sweep covered the 12 cells T ∈ {0.8, 1.0, 1.2} Γ— top-k ∈ {0, 24} Γ— min-p ∈ {0.035, 0.05}; the same cell β€” temperature = 1.2, top-k = 0 (no truncation), min-p = 0.035 β€” wins on both Mean OA and FMD for every autoregressive model in this repo. A wider post-training sweep extended temperature to 1.8 on a common kong reference and found the OA optimum lies above 1.2 (the real-time model peaks at T = 1.4).

Model best (T, k, p) Mean OA ↑ FMD ↓ (CLaMP-2)
aria-full-quality (1.2, 0, min-p 0.035) 0.911 272.6
aria-real-time (1.2, 0, min-p 0.035) 0.804 233.6
lstm-kong-pedal (1.2, 0, min-p 0.035) 0.768 438.6
lstm-hawthorne (1.2, 0, min-p 0.035) 0.664 427.8

Three robust observations from the sweep:

  • Temperature dominates. Bumping T from 0.8 β†’ 1.2 buys +0.18–0.30 absolute OA on Aria at every (k, p) cell and +0.28 on both LSTM splits.
  • Don't truncate. top-k = 0 (no truncation) beats top-k = 24 by 0.03–0.07 OA at every (T, p) cell.
  • min-p is comparatively flat between 0.035 and 0.05; the smaller value wins by a small margin everywhere.

Reproducibility

All checkpoints were produced by the pipeline scripts in the code repository (scripts/aria_pipeline_per_variant.sh for the Aria variants, scripts/train_performancernn_lstm_pipeline.sh for the LSTMs). Reported metrics come from src/eval_aria_metrics.py (OA / KLD) and scripts/fmd_eval_sweeps.py (FMD with the CLaMP-2 music encoder).

License and attribution

This repository is released under CC-BY-NC-SA-4.0 β€” non-commercial, with attribution and share-alike β€” the most restrictive of the licenses of the training data and base models. Per-model provenance:

  • LSTM checkpoints (lstm-hawthorne, lstm-kong-pedal) were pretrained directly on Aria-MIDI (CC-BY-NC-SA-4.0) and fine-tuned on PiJAMA (CC-BY-NC) β†’ CC-BY-NC-SA-4.0.
  • Aria fine-tunes (aria-full-quality, aria-real-time) start from the public Aria weights (Apache-2.0) and are fine-tuned on PiJAMA (CC-BY-NC) β†’ effectively CC-BY-NC-4.0; the repository is tagged at the stricter license above.

Use is non-commercial / research only. Please attribute and cite PiJAMA (Edwards et al. 2024), Aria-MIDI (Bradshaw & Colton 2025) and Aria (EleutherAI, Apache-2.0). PiJAMA and Aria-MIDI are automatic transcriptions of copyrighted recordings; the underlying works remain under their original copyright, so this release is for non-commercial research only.

Citation

If you use these checkpoints, please cite the original PiJAMA + Aria papers:

  • Edwards, Dixon and Benetos. PiJAMA: Piano Jazz with Automatic MIDI Annotations. ISMIR Transactions, 6(1):89–102, 2024.
  • Bradshaw and Colton. Scaling self-supervised representation learning for symbolic piano performance. arXiv:2506.23869, 2025.
  • Oore, Simon, Dieleman, Eck, Simonyan. This Time with Feeling: Learning Expressive Musical Performance. Neural Computing and Applications, 32:955–967, 2020.
Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for napanto/jazz-piano-performance-modeling