NanoMaestro is a tiny, approximately 50 MB music-generation model with about 13 million parameters, designed to run continuously in real time on almost any consumer CPU.

The model generates symbolic piano-note events, not audio. These events are rendered as piano sound by the playback system.

Try the web demo on Hugging Face Spaces. Inference runs entirely on your device through WebAssembly (WASM), allowing NanoMaestro to generate music directly on your device locally.

Model Architecture

NanoMaestro is an autoregressive event-token model with a 128-dimensional embedding, two stacked LSTM layers with 1,024 hidden units each, 0.1 dropout, and a linear output head over its 623-token vocabulary. It contains approximately 13.8 million parameters and predicts one musical event at a time while carrying its recurrent hidden and cell states forward for uninterrupted generation.

Training Pipeline

The training corpus contains approximately 496 million tokens from nearly 80,000 cleaned piano MIDI pieces. Tokens are divided into 256-token input sequences with next-token targets, using stride-16 windows with a rotating offset so all 16 window alignments are visited across training. The model is optimized with AdamW and cross-entropy loss using mixed-precision CUDA training, a batch size of 2,048, and validation-loss checkpointing before ONNX export and dynamic INT8 quantization.

Custom Event Tokenizer

MIDI timing is aligned to 64 positions per whole note, equivalent to 16 steps per quarter note, preserving rapid passages while regularizing small performance-timing variations. Each piece begins with boundary, tempo, and grid tokens; bar and position tokens place groups of notes in time; and every note is encoded as a pitch, duration, and one of eight velocity levels. Pitches are limited to the piano range of MIDI 21-108 and durations to 256 grid steps, retaining polyphony, dynamics, tempo, note lengths, and musical structure in a compact vocabulary.

Key Features

Real-time, continuous music generation
Runs locally on consumer CPUs
Fully client-side browser inference with WASM
Compact quantized model
Symbolic piano generation with real-time playback

Web Demo

Launch the NanoMaestro Realtime web demo. Inference runs locally in your browser.

TODO

✅ Release full local inference code
Release full training code
Release full tokenizer code

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support