Magenta RealTime 2 β ONNX
ONNX export of google/magenta-realtime-2
(MRT2), an open-weights real-time music-generation model from Google DeepMind. This repo
re-packages the MRT2 components as ONNX graphs that run with
onnxruntime on CPU, CUDA GPU, and the web
(onnxruntime-web or
jax-js) β no JAX, TensorFlow, or Apple-MLX runtime required.
Unofficial community export by @blanchon. All model weights Β© Google LLC, redistributed under CC-BY-4.0 (see License & terms below). Converted from the original
google/magenta-realtime-2artifacts.
Components
MRT2 is three models chained together:
| Component | Role | This repo |
|---|---|---|
| MusicCoCa | text / audio β 768-d style embedding β 12 RVQ style tokens | musiccoca/*.onnx |
| SpectroStream | 48 kHz stereo audio codec (encode β 12-RVQ tokens β decode) | spectrostream/*.onnx |
| Depthformer LLM | autoregressive frame-wise generator of audio tokens (style + MIDI + context β tokens) | mrt2_small/onnx/ (230M, fp32) |
The full text-to-music pipeline is: MusicCoCa(prompt) β style tokens β Depthformer generates SpectroStream tokens frame-by-frame (25 Hz) β SpectroStream decoder β 48 kHz stereo audio.
Repository layout
musiccoca/ # style model (5 ONNX graphs + SentencePiece)
text_encoder.onnx audio_preprocessor.onnx music_encoder.onnx
pretrained_vector_quantizer.onnx mapper.onnx spm.model
spectrostream/ # audio codec
encoder.onnx decoder.onnx
mrt2_small/onnx/ # 230M Depthformer LLM, self-contained fp32 graphs
encoder.onnx temporal_step.onnx depth_step.onnx embed.onnx
Every .onnx here is a single self-contained file (no external .onnx.data).
How the LLM is structured
The Depthformer is exported as four self-contained ONNX graphs β encoder.onnx,
temporal_step.onnx, depth_step.onnx, embed.onnx β driven by a thin host-side runtime
loop that carries the fixed-size windowed KV-cache between frames and does sampling
host-side. For each 25 Hz frame the temporal step runs once, then the depth step + embed run
once per RVQ level (12) to emit the 12 codes for that frame. A complete, readable reference
implementation of this loop is src/lib/mrt2.ts in the
demo Space (jax-js).
Quick start (Python, onnxruntime)
import onnxruntime as ort, numpy as np
# Each graph is provider-agnostic β CPU here, or CUDA / web elsewhere.
sess = ort.InferenceSession("musiccoca/pretrained_vector_quantizer.onnx",
providers=["CPUExecutionProvider"]) # ["CUDAExecutionProvider", ...] for GPU
emb = np.zeros((1, 768), np.float32)
tokens = sess.run(None, {sess.get_inputs()[0].name: emb})[0] # 12 RVQ style tokens
print(tokens)
The four LLM graphs and the SpectroStream decoder are driven exactly as in the runtime loop described above (see the demo Space for the canonical sequence of feeds).
Quick start (Web)
A live in-browser demo (with full source) is the π΅ demo Space: the Depthformer LLM runs natively in jax-js (WebGPU), and the SpectroStream decoder runs in onnxruntime-web (WASM) because it uses ops jax-js doesn't implement. Pick a prompt, generate, and play the result.
To load a single graph directly with onnxruntime-web:
<script type="module">
import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.all.min.mjs";
const sess = await ort.InferenceSession.create("musiccoca/pretrained_vector_quantizer.onnx",
{ executionProviders: ["webgpu", "wasm"] });
const emb = new ort.Tensor("float32", new Float32Array(768), [1, 768]);
const out = await sess.run({ [sess.inputNames[0]]: emb });
console.log(out[sess.outputNames[0]].data); // 12 RVQ style tokens
</script>
Validation
All graphs were validated against the original models. Discrete outputs (RVQ tokens) match exactly; continuous outputs match within fp32 tolerance:
| Component | vs original |
|---|---|
| MusicCoCa (text β style tokens) | token-exact |
| SpectroStream codec | codes exact, decode β€ 9e-5 |
| Depthformer LLM (small, fp32) | codes exact (PyTorch & ONNX, in-browser) |
| Full pipeline (prompt β audio) | codes exact vs JAX fp32 |
How this was exported
- MusicCoCa: TFLite β ONNX via a patched
tf2onnx(addedFULLY_CONNECTED keep_num_dims,GELU,EMBEDDING_LOOKUPhandlers), and the log-mel STFTRFFT/ComplexAbsisland was replaced with an equivalent DFT cos/sin matmul so it uses only ONNX-standard, web-compatible ops. - SpectroStream & Depthformer: the original
sequence-layers/JAX graphs do not lower cleanly throughjax2tf(opaqueXlaCallModule) orjax2onnx(shape-tracing limits), so these were reimplemented in PyTorch from the architecture and checkpoint weights, validated numerically against the JAX reference, and exported withtorch.onnx.
License & terms
Magenta RealTime 2 is released by Google under a combination of licenses: the codebase under Apache 2.0 and the model weights under CC-BY-4.0. The ONNX artifacts here are derived from those weights and are redistributed under CC-BY-4.0, with attribution to Google LLC / Google DeepMind. Google's MRT2 usage terms apply (Copyright 2026 Google LLC; use responsibly, do not generate infringing content; Google claims no rights in your outputs; "AS IS", no warranty). See the original model card.
Citation
@inproceedings{gdmlyria2025live,
title={Live Music Models},
author={Caillon, Antoine and others},
booktitle={NeurIPS Creative AI}, year={2025}
}
Original: https://huggingface.co/google/magenta-realtime-2 Β· https://github.com/magenta/magenta-realtime
Model tree for blanchon/magenta-realtime-2-onnx
Base model
google/magenta-realtime-2