Instructions to use soniqo/Supertonic-3-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Supertonic-3-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Supertonic
How to use soniqo/Supertonic-3-LiteRT with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
SupertonicTTS-3 β LiteRT (.tflite, Android / Qualcomm NPU)
First-party LiteRT export of Supertonic-3's four non-autoregressive flow-matching graphs. Built by
our own pipeline (speech-models/stmodels): weights lifted from the
Supertone/supertonic-3 ONNX initializers β PyTorch
nn.Module β litert_torch.convert (torch.export β StableHLO β TFLite). This avoids the onnx2tf
NCHW/ConvNeXt layout failures that block direct ONNXβTFLite for this model.
Graphs & parity (FP32, vs ONNX Runtime)
| Module | tflite | parity max|Ξ| |
|---|---|---|
duration_predictor.tflite |
3.4 MB | 4.1e-05 β |
vector_estimator.tflite (ODE denoiser) |
244 MB | 5.6e-03 β |
vocoder.tflite |
97 MB | 2.6e-04 β |
text_encoder.tflite |
34 MB | 1.1e-01 (localized; mean ~2.5e-4) β οΈ |
Fixed shapes (T=128, L=64) in this revision β pad/segment text to 128 and bucket latent length;
dynamic axes are a follow-up. The host runs the flow-matching ODE loop (vector_estimator Γtotal_steps).
Assets to drive them: tts.json, unicode_indexer.json (G2P-free tokenizer), voice_styles/*.json.
Running on Android / Qualcomm NPU
- CPU/GPU: LiteRT (
ai_edge_litert/ TFLite) interpreter with XNNPACK/GPU delegate. - Qualcomm HTP/NPU: the LiteRT QNN delegate at runtime, or compile to a QNN context binary via
Qualcomm AI Hub (
qai_hub) from these graphs (static shapes are HTP-friendly). int8/int4 PTQ viaai-edge-quantizerfor full HTP residency is a follow-up.
Attribution & license
- Weights: derivative of
Supertone/supertonic-3(commit3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 β OpenRAIL-M (use-based restrictions carry over).
Other Supertonic-3 formats
- Supertonic-3 β ONNX (INT8) β server / desktop (ONNX Runtime).
- Supertonic-3 β CoreML β iOS / Apple Neural Engine (.mlpackage).
Ecosystem
- soniqo.audio β website / use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library; Supertonic plugs in as a
TTSInterfaceLiteRT model. - speech-swift β Apple Silicon MLX + CoreML runtime.
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
- Downloads last month
- 31
Model tree for soniqo/Supertonic-3-LiteRT
Base model
Supertone/supertonic-3