Stable Audio Morph β€” ONNX Models

ONNX FP16 exports of the SAME autoencoder from Stable Audio 3 Small-Music for browser-based latent space audio morphing.

Models

File Size Description
encoder_fp16.onnx 104 MB Encodes 10s stereo audio (44.1kHz) to latent [1, 256, 108]
decoder_fp16.onnx 105 MB Decodes latent [1, 256, 108] to stereo audio

Usage

These models run in the browser via ONNX Runtime Web with WebGPU or WASM backends.

import * as ort from "onnxruntime-web";

const encoder = await ort.InferenceSession.create("encoder_fp16.onnx");
const decoder = await ort.InferenceSession.create("decoder_fp16.onnx");

// Encode audio to latent
const input = new ort.Tensor("float32", audioData, [1, 2, 441000]);
const { latent } = await encoder.run({ audio: input });

// Decode latent to audio
const { audio } = await decoder.run({ latent });

Source

  • Original model: stabilityai/stable-audio-3-small-music
  • Original repo: Stability-AI/stable-audio-3
  • Paper: Stable Audio 3 (Stability AI, 2025)
  • Autoencoder: SAME (Semantic-Acoustic autoencoder), 108M parameters
  • Compression ratio: 4096x (44.1kHz stereo to 256-dim latent at ~10.8 Hz)
  • Training data: AudioSparx (806K recordings) + Freesound (472K recordings)

Export Details

  • Exported from SA3 Small-Music checkpoint using torch.onnx.export (opset 18)
  • Converted to FP16 via onnxconverter-common
  • Validated: round-trip correlation > 0.99
  • No text encoder included (T5Gemma removed for latent-only usage)

Technical Specs

  • Input (encoder): [1, 2, 441000] float32 β€” 10 seconds stereo at 44.1kHz
  • Output (encoder): [1, 256, 108] float32 β€” 256-dim latent, 108 temporal frames
  • Input (decoder): [1, 256, 108] float32
  • Output (decoder): [1, 2, 442368] float32 β€” stereo audio

License

These weights are derived from Stability AI's Stable Audio 3 model. Usage is subject to the Stability AI Community License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shoegazerstella/stable-audio-morph-onnx