Instructions to use napanto/jazz-piano-performance-modeling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use napanto/jazz-piano-performance-modeling with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir jazz-piano-performance-modeling napanto/jazz-piano-performance-modeling
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Expressive Jazz Piano Performance Modeling
Trained checkpoints for jazz piano performance modeling, covering two model families: two pretrained PerformanceRNN LSTM variants and two fine-tunes of the publicly released Aria 1B-parameter piano language model (Bradshaw & Colton 2025). All models are fine-tuned/trained on PiJAMA (Edwards et al. 2024). The code repository is https://github.com/napanto/jazz-piano-performance-modeling. The Aria fine-tunes also drive a real-time macOS application: https://github.com/napanto/aria-realtime-studio.
Models in this repo
The model names below match the headline table of the report.
aria-full-quality/ β Aria fine-tune, full-quality (offline)
Aria 1B-parameter LLaMA-3.2-style decoder fine-tuned on the PiJAMA
hawthorne split with the default 17 727-id AbsTokenizer (no
sustain pedal). Architecture: medium (d=1536, 16 layers, 24 heads,
RoPE, GQA, max_seq_len=8192). Best swept mean OA on the test split:
0.911. FMD vs the kong test-pool reference (CLaMP-2 encoder):
272.6.
tested.safetensorsβ the checkpoint reported on in the paper (4-stage train/val pipeline: retrained on TRAIN+VAL for the patience-selected epoch count, evaluated once on TEST).deployed.safetensorsβ full retrain on TRAIN+VAL+TEST for the same epoch count. For deployment / listening only; test metrics are not honestly reportable on this checkpoint because it has seen the test set.
aria-real-time/ β Aria fine-tune, real-time MLX-compatible
Same backbone but loaded from the public model-demo.safetensors
checkpoint with the residual-stream embedding-projection layer
preserved (medium-emb architecture, +1536Γ512 emb_proj). Trained on
PiJAMA kong album-aware split with the 2 675-id demo tokenizer
that adds explicit sustain-pedal events. Drop-in jazz replacement for
the upstream aria/demo/demo_mlx.py iOS sampler. Best swept mean OA:
0.804. FMD: 233.6.
tested.safetensors,deployed.safetensorsβ same conventions as above.
lstm-hawthorne/ β pretrained PerformanceRNN, hawthorne split
3-layer stacked LSTM (hidden 512, embed 512, tied I/O head, 6.46M params; paper-faithful PerformanceRNN, Oore et al. 2018). Pretrained on the 820 944-file Aria-MIDI corpus (30k steps) and fine-tuned on the PiJAMA hawthorne split with the 413-id no-pedal vocabulary. Best swept mean OA: 0.664. FMD: 427.8.
tested.ptβ Stage-B equivalent (the one reported on).
lstm-kong-pedal/ β pretrained PerformanceRNN, kong+pedal split
Same architecture but with the 314-id pedal-aware vocabulary (NOTE_ONΓ88 + NOTE_OFFΓ88 + TIME_SHIFTΓ100 + VELOCITYΓ32 + SUSTAIN_ON/OFF + 4 specials). Fine-tuned on the PiJAMA kong album-aware split. Best swept mean OA: 0.768 (only β0.04 below Aria real-time despite a ~150Γ parameter ratio). FMD: 438.6.
tested.pt
Note: a full retrain on TRAIN+VAL+TEST was not performed for the LSTMs (their compute cost is small enough that the 4-stage generalisation-honest pipeline already gives a strong deployment baseline). If you need that variant, the training script in the code repository reproduces it in β25 minutes on a single NVIDIA B200 (or comparable GPU).
MLX variants for macOS inference
Each Aria model also has mlx-tested/ and mlx-deployed/ directories
containing:
model.safetensorsβ same weights as the top-level safetensors, laid out for loading viamlx.core.load()on Apple silicon.config.jsonβ the corresponding Aria model config (medium.jsonfor full-quality,medium-emb.jsonfor real-time).- For
aria-real-time/mlx-*only:tokenizer-config.json, the same 2 675-id demo tokenizer the upstreamaria/demo/demo_mlx.pyuses.
Running on macOS
aria-real-time/mlx-tested/ is a drop-in replacement for the
weights expected by the upstream aria/demo/demo_mlx.py (iOS / Apple
silicon real-time sampler from EleutherAI/aria). Point that script at
model.safetensors and use the bundled tokenizer-config.json:
python aria/demo/demo_mlx.py \
--checkpoint-path /path/to/mlx-tested/model.safetensors \
--tokenizer-config /path/to/mlx-tested/tokenizer-config.json
aria-full-quality/mlx-*/ ships the full-quality weights and the
medium.json config. The upstream demo_mlx.py hardcodes the
medium-emb arch, so to run these checkpoints on MLX you either:
- Adapt
aria.inference.model_mlx.TransformerLMto loadmediuminstead ofmedium-emb(drop theemb_projlayer), or - Run inference via PyTorch with the MPS backend on macOS, using the
top-level
tested.safetensors/deployed.safetensorsand the defaultAbsTokenizer(no demo tokenizer config needed).
The full-quality checkpoints are β2.5 GB in bf16 β they fit easily on β₯16 GB unified-memory Apple silicon for inference.
Loading from Python
Aria (any variant) on CUDA / ROCm / MPS
from aria.config import load_model_config
from aria.model import ModelConfig, TransformerLM
from safetensors.torch import load_file
model_config = ModelConfig(**load_model_config("medium")) # or "medium-emb"
model_config.set_vocab_size(17727) # or 2675 for real-time
model = TransformerLM(model_config)
model.load_state_dict(load_file("tested.safetensors"), strict=False)
model.eval()
Aria on MLX (Apple silicon)
import mlx.core as mx
weights = mx.load("mlx-tested/model.safetensors")
# β¦ then build the MLX TransformerLM as in aria.inference.model_mlx
LSTM
import torch
from src.models.performancernn_lstm import PerformanceRNNLSTM, PerformanceRNNLSTMConfig
ckpt = torch.load("tested.pt", map_location="cpu", weights_only=False)
cfg = PerformanceRNNLSTMConfig(**ckpt["config"])
model = PerformanceRNNLSTM(cfg)
model.load_state_dict(ckpt["model_state"], strict=True)
model.eval()
(The PerformanceRNNLSTM / PerformanceRNNLSTMConfig definitions live
in the code repository under src/models/performancernn_lstm.py.)
Recommended sampling settings
For the four autoregressive models the Stage-C sweep covered the 12
cells T β {0.8, 1.0, 1.2} Γ top-k β {0, 24} Γ min-p β {0.035, 0.05};
the same cell β temperature = 1.2, top-k = 0 (no truncation),
min-p = 0.035 β wins on both Mean OA and FMD for every autoregressive
model in this repo. A wider post-training sweep extended temperature to
1.8 on a common kong reference and found the OA optimum lies above 1.2
(the real-time model peaks at T = 1.4).
| Model | best (T, k, p) |
Mean OA β | FMD β (CLaMP-2) |
|---|---|---|---|
aria-full-quality |
(1.2, 0, min-p 0.035) | 0.911 | 272.6 |
aria-real-time |
(1.2, 0, min-p 0.035) | 0.804 | 233.6 |
lstm-kong-pedal |
(1.2, 0, min-p 0.035) | 0.768 | 438.6 |
lstm-hawthorne |
(1.2, 0, min-p 0.035) | 0.664 | 427.8 |
Three robust observations from the sweep:
- Temperature dominates. Bumping
Tfrom 0.8 β 1.2 buys +0.18β0.30 absolute OA on Aria at every(k, p)cell and +0.28 on both LSTM splits. - Don't truncate.
top-k = 0(no truncation) beatstop-k = 24by 0.03β0.07 OA at every(T, p)cell. min-pis comparatively flat between 0.035 and 0.05; the smaller value wins by a small margin everywhere.
Reproducibility
All checkpoints were produced by the pipeline scripts in the code
repository (scripts/aria_pipeline_per_variant.sh for the Aria
variants, scripts/train_performancernn_lstm_pipeline.sh for the
LSTMs). Reported metrics come from src/eval_aria_metrics.py (OA / KLD)
and scripts/fmd_eval_sweeps.py (FMD with the CLaMP-2 music encoder).
License and attribution
This repository is released under CC-BY-NC-SA-4.0 β non-commercial, with attribution and share-alike β the most restrictive of the licenses of the training data and base models. Per-model provenance:
- LSTM checkpoints (
lstm-hawthorne,lstm-kong-pedal) were pretrained directly on Aria-MIDI (CC-BY-NC-SA-4.0) and fine-tuned on PiJAMA (CC-BY-NC) β CC-BY-NC-SA-4.0. - Aria fine-tunes (
aria-full-quality,aria-real-time) start from the public Aria weights (Apache-2.0) and are fine-tuned on PiJAMA (CC-BY-NC) β effectively CC-BY-NC-4.0; the repository is tagged at the stricter license above.
Use is non-commercial / research only. Please attribute and cite PiJAMA (Edwards et al. 2024), Aria-MIDI (Bradshaw & Colton 2025) and Aria (EleutherAI, Apache-2.0). PiJAMA and Aria-MIDI are automatic transcriptions of copyrighted recordings; the underlying works remain under their original copyright, so this release is for non-commercial research only.
Citation
If you use these checkpoints, please cite the original PiJAMA + Aria papers:
- Edwards, Dixon and Benetos. PiJAMA: Piano Jazz with Automatic MIDI Annotations. ISMIR Transactions, 6(1):89β102, 2024.
- Bradshaw and Colton. Scaling self-supervised representation learning for symbolic piano performance. arXiv:2506.23869, 2025.
- Oore, Simon, Dieleman, Eck, Simonyan. This Time with Feeling: Learning Expressive Musical Performance. Neural Computing and Applications, 32:955β967, 2020.
Quantized