Stable Audio 3 Medium β ONNX
ONNX conversion of Stability AI's Stable Audio 3 Medium model, optimized for CPU inference in a C++ context.
Powered by Stability AI.
β οΈ License β Read Before Downloading
This repository contains files governed by two distinct licenses. You must accept both before using or downloading any file.
1. Stability AI Community License
The ONNX model files (DIT, encoder, decoder) are derived from Stability AI's Stable Audio 3 Medium model and are subject to the Stability AI Community License Agreement.
"This Stability AI Model is licensed under the Stability AI Community License, Copyright Β© Stability AI Ltd. All Rights Reserved."
π Full license: https://stability.ai/community-license-agreement
Key points:
- β Free for research and non-commercial use
- β Free for commercial use if your annual revenue is under $1,000,000 USD
- β οΈ Revenue above $1M requires an enterprise license from Stability AI
- β Cannot be used to train or improve other foundational generative AI models
2. Gemma Terms of Use (T5Gemma / Text Encoder & Tokenizer)
The tokenizer and text encoder components include weights and architecture derived from Google's T5Gemma model, subject to the Gemma Terms of Use.
π Full license: https://ai.google.dev/gemma/terms
Key points:
- β Free for research and commercial use (subject to terms)
- β Cannot be used to train models that compete with Google's Gemma products
- β Cannot be used to circumvent safety filters or policies
Files
| File | Description | Size (approx.) |
|---|---|---|
dit.onnx |
Diffusion Transformer (main generation model) | ~3.88 GB |
dit.onnx.data |
External weights for the DIT model | ~5.81 GB |
dec_dynamic_bf16.onnx |
Audio decoder (latents β waveform) | ~219 MB |
enc_dynamic_bf16.onnx |
Audio encoder (waveform β latents) | ~216 MB |
encoder.onnx |
Text/conditioning encoder (T5Gemma) | ~620 MB |
tokenizer.json |
T5Gemma tokenizer (Google) | ~ 32.7 MB |
Pipeline
Text Prompt β encoder.onnx + tokenizer.json (T5Gemma)
β
dit_fp32.onnx + dit_fp32.onnx.data (diffusion)
β
dec_dynamic_bf16.onnx (audio decoding)
β
WAV data (stereo, 44.1kHz)
Runs entirely locally on CPU. No internet connection required after the initial download. No API key. No cloud.
Attribution
- Original model: Stable Audio 3 Optimized by Stability AI
Powered by Stability AI