SupertonicTTS-3 — LiteRT (.tflite, Android / Qualcomm NPU)

First-party LiteRT export of Supertonic-3's four non-autoregressive flow-matching graphs. Built by our own pipeline (speech-models/stmodels): weights lifted from the Supertone/supertonic-3 ONNX initializers → PyTorch nn.Module → litert_torch.convert (torch.export → StableHLO → TFLite). This avoids the onnx2tf NCHW/ConvNeXt layout failures that block direct ONNX→TFLite for this model.

Graphs & parity (FP32, vs ONNX Runtime)

Module	tflite	parity max\|Δ\|
`duration_predictor.tflite`	3.4 MB	4.1e-05 ✓
`vector_estimator.tflite` (ODE denoiser)	244 MB	5.6e-03 ✓
`vocoder.tflite`	97 MB	2.6e-04 ✓
`text_encoder.tflite`	34 MB	1.1e-01 (localized; mean ~2.5e-4) ⚠️

Fixed shapes (T=128, L=64) in this revision — pad/segment text to 128 and bucket latent length; dynamic axes are a follow-up. The host runs the flow-matching ODE loop (vector_estimator ×total_steps). Assets to drive them: tts.json, unicode_indexer.json (G2P-free tokenizer), voice_styles/*.json.

Running on Android / Qualcomm NPU

CPU/GPU: LiteRT (ai_edge_litert / TFLite) interpreter with XNNPACK/GPU delegate.
Qualcomm HTP/NPU: the LiteRT QNN delegate at runtime, or compile to a QNN context binary via Qualcomm AI Hub (qai_hub) from these graphs (static shapes are HTP-friendly). int8/int4 PTQ via ai-edge-quantizer for full HTP residency is a follow-up.

Attribution & license

Weights: derivative of Supertone/supertonic-3 (commit 3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 — OpenRAIL-M (use-based restrictions carry over).

Other Supertonic-3 formats

Supertonic-3 — ONNX (INT8) — server / desktop (ONNX Runtime).
Supertonic-3 — CoreML — iOS / Apple Neural Engine (.mlpackage).

Ecosystem

soniqo.audio — website / use-case explorer (transcription, voice cloning, live ASR, voice agents).
speech-core — C++ orchestration library; Supertonic plugs in as a TTSInterface LiteRT model.
speech-swift — Apple Silicon MLX + CoreML runtime.
speech-android — Android SDK consuming on-device LiteRT bundles.

Other LiteRT models in this collection

VoxCPM2 — LiteRT (INT8) (TTS)
Parakeet TDT 0.6B v3 — LiteRT (INT8) · Nemotron Speech Streaming — LiteRT (ASR)
Silero VAD v5 — LiteRT (VAD)

Downloads last month: 31

Model tree for soniqo/Supertonic-3-LiteRT

Base model

Supertone/supertonic-3

Finetuned

(4)

this model

Collection including soniqo/Supertonic-3-LiteRT

LiteRT

Collection

LiteRT (.tflite) bundles for soniqo.audio. ASR, VAD, diarization, speaker ID, streaming, TTS — served by speech-cloud and speech-core. • 13 items • Updated 1 day ago • 1

Paper for soniqo/Supertonic-3-LiteRT

SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

Paper • 2503.23108 • Published Mar 29, 2025 • 1