Stable Audio 3 Medium — ONNX

ONNX conversion of Stability AI's Stable Audio 3 Medium model, optimized for CPU inference in a C++ context.
Powered by Stability AI.

⚠️ License — Read Before Downloading

This repository contains files governed by two distinct licenses. You must accept both before using or downloading any file.

1. Stability AI Community License

The ONNX model files (DIT, encoder, decoder) are derived from Stability AI's Stable Audio 3 Medium model and are subject to the Stability AI Community License Agreement.

"This Stability AI Model is licensed under the Stability AI Community License, Copyright © Stability AI Ltd. All Rights Reserved."

📄 Full license: https://stability.ai/community-license-agreement

Key points:

✅ Free for research and non-commercial use
✅ Free for commercial use if your annual revenue is under $1,000,000 USD
⚠️ Revenue above $1M requires an enterprise license from Stability AI
❌ Cannot be used to train or improve other foundational generative AI models

2. Gemma Terms of Use (T5Gemma / Text Encoder & Tokenizer)

The tokenizer and text encoder components include weights and architecture derived from Google's T5Gemma model, subject to the Gemma Terms of Use.

📄 Full license: https://ai.google.dev/gemma/terms

Key points:

✅ Free for research and commercial use (subject to terms)
❌ Cannot be used to train models that compete with Google's Gemma products
❌ Cannot be used to circumvent safety filters or policies

Files

File	Description	Size (approx.)
`dit.onnx`	Diffusion Transformer (main generation model)	~3.88 GB
`dit.onnx.data`	External weights for the DIT model	~5.81 GB
`dec_dynamic_bf16.onnx`	Audio decoder (latents → waveform)	~219 MB
`enc_dynamic_bf16.onnx`	Audio encoder (waveform → latents)	~216 MB
`encoder.onnx`	Text/conditioning encoder (T5Gemma)	~620 MB
`tokenizer.json`	T5Gemma tokenizer (Google)	~ 32.7 MB

Pipeline

Text Prompt → encoder.onnx + tokenizer.json   (T5Gemma)
                    ↓
            dit_fp32.onnx + dit_fp32.onnx.data   (diffusion)
                    ↓
            dec_dynamic_bf16.onnx                 (audio decoding)
                    ↓
                  WAV data (stereo, 44.1kHz)

Runs entirely locally on CPU. No internet connection required after the initial download. No API key. No cloud.

Attribution

Original model: Stable Audio 3 Optimized by Stability AI

Powered by Stability AI

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support