SupertonicTTS-3 — int8 ONNX (on-device)

Dynamic int8 quantization of Supertone/supertonic-3 for low-memory on-device inference. Same 4-graph non-autoregressive flow-matching pipeline (duration_predictor → text_encoder → vector_estimator ×N → vocoder), 31 languages, 44.1 kHz, ONNX Runtime — just smaller and lighter.

What changed vs the base

Weights quantized to int8 via onnxruntime.quantization.quantize_dynamic (QInt8); activations fp32.
ONNX weights: 398 MB → 102 MB. Inference peak RSS: ~1026 MB → ~327 MB (3.1×) (measured, Apple Silicon, ORT CPU).
RTF ≈ neutral on Apple Silicon (~0.20 @ 8 steps); the int8 speed win lands on weaker ARM CPUs / NPUs.
Roundtrip verified for English, German, Korean.

onnx/{duration_predictor,text_encoder,vector_estimator,vocoder}.onnx (int8) · onnx/tts.json · onnx/unicode_indexer.json · voice_styles/*.json (10 voices) · config.json · LICENSE.

Usage

Drop-in for the supertonic package via model_dir, or run the 4 graphs directly with ONNX Runtime. The text front-end is G2P-free (NFKD + unicode_indexer.json lookup — no espeak/phonemizer).

Attribution & license

Derivative of Supertone/supertonic-3 (commit 3cadd1ee6394adea1bd021217a0e650ede09a323) by Supertone, Inc. (paper arXiv:2503.23108). Licensed under BigScience OpenRAIL-M — the upstream use-based restrictions carry over (no non-consensual impersonation/deepfakes, no undisclosed machine-generated content, etc.) and must pass through to downstream users. This card marks it a modified (quantized) artifact per the license. The original LICENSE is included.

Other Supertonic-3 formats

Supertonic-3 — LiteRT — Android / Qualcomm NPU (.tflite).
Supertonic-3 — CoreML — iOS / Apple Neural Engine (.mlpackage).

Ecosystem

soniqo.audio — website / use-case explorer (transcription, voice cloning, live ASR, voice agents).
speech-core — C++ orchestration library; Supertonic plugs in as a TTSInterface ONNX model.
speech-swift — Apple Silicon MLX + CoreML runtime.
speech-android — Android SDK consuming on-device LiteRT bundles.