Stable Audio Morph — ONNX Models

ONNX FP16 exports of the SAME autoencoder from Stable Audio 3 Small-Music for browser-based latent space audio morphing.

Models

File	Size	Description
`encoder_fp16.onnx`	104 MB	Encodes 10s stereo audio (44.1kHz) to latent [1, 256, 108]
`decoder_fp16.onnx`	105 MB	Decodes latent [1, 256, 108] to stereo audio

Usage

These models run in the browser via ONNX Runtime Web with WebGPU or WASM backends.

import * as ort from "onnxruntime-web";

const encoder = await ort.InferenceSession.create("encoder_fp16.onnx");
const decoder = await ort.InferenceSession.create("decoder_fp16.onnx");

// Encode audio to latent
const input = new ort.Tensor("float32", audioData, [1, 2, 441000]);
const { latent } = await encoder.run({ audio: input });

// Decode latent to audio
const { audio } = await decoder.run({ latent });

Source

Original model: stabilityai/stable-audio-3-small-music
Original repo: Stability-AI/stable-audio-3
Paper: Stable Audio 3 (Stability AI, 2025)
Autoencoder: SAME (Semantic-Acoustic autoencoder), 108M parameters
Compression ratio: 4096x (44.1kHz stereo to 256-dim latent at ~10.8 Hz)
Training data: AudioSparx (806K recordings) + Freesound (472K recordings)

Export Details

Exported from SA3 Small-Music checkpoint using torch.onnx.export (opset 18)
Converted to FP16 via onnxconverter-common
Validated: round-trip correlation > 0.99
No text encoder included (T5Gemma removed for latent-only usage)

Technical Specs

Input (encoder): [1, 2, 441000] float32 — 10 seconds stereo at 44.1kHz
Output (encoder): [1, 256, 108] float32 — 256-dim latent, 108 temporal frames
Input (decoder): [1, 256, 108] float32
Output (decoder): [1, 2, 442368] float32 — stereo audio

License

These weights are derived from Stability AI's Stable Audio 3 model. Usage is subject to the Stability AI Community License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for shoegazerstella/stable-audio-morph-onnx

Base model

stabilityai/stable-audio-3-small-music-base

Finetuned

stabilityai/stable-audio-3-small-music

Quantized

(4)

this model