Instructions to use aufklarer/Supertonic-3-CoreML-FP16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Supertonic
How to use aufklarer/Supertonic-3-CoreML-FP16 with Supertonic:
from supertonic import TTS tts = TTS(auto_download=True) style = tts.get_voice_style(voice_name="M1") text = "The train delay was announced at 4:45 PM on Wed, Apr 3, 2024 due to track maintenance." wav, duration = tts.synthesize(text, voice_style=style) tts.save_audio(wav, "output.wav")
- Notebooks
- Google Colab
- Kaggle
SupertonicTTS-3 β CoreML FP16 (Apple Neural Engine)
Mixed-precision CoreML export of Supertonic-3's four non-autoregressive flow-matching graphs, tuned for Apple Neural Engine residency on iOS 18+ / macOS 15+. The precision split was chosen by measured end-to-end SNR β flow-matching is an ODE and its trajectory is precision-sensitive, so the graphs that drive it stay FP32:
TextEncoder+VectorEstimatorβ FP32 (drive the ODE trajectory; the vector-estimator is ANE-ineligible regardless of precision).Vocoder+DurationPredictorβ FP16 (the vocoder is single-shot and the 100%-ANE win).
Controlled shared-noise A/B vs the FP32 reference: mag-STFT SNR 47β51 dB (en/de/ko), max|Ξ| 6e-3 β
transparent. Full-FP16-everything instead drops to ~24 dB through ODE trajectory divergence (not
degradation) β this mixed split keeps measurable fidelity while still putting the vocoder on the ANE.
**189 MB** vs ~380 MB for FP32.
This repo is self-contained: the four .mlpackage graphs + the G2P-free tokenizer assets
(tts.json, unicode_indexer.json) + voice_styles/. The host runs the flow-matching ODE loop
(vector_estimator Γtotal_step); the graphs contain no control flow.
Parity reference / FP32:
aufklarer/Supertonic-3-CoreML.
Other Supertonic-3 formats
- Supertonic-3 β CoreML β FP32 parity reference (iOS / ANE).
- Supertonic-3 β ONNX (INT8) β server / desktop (ONNX Runtime).
- Supertonic-3 β LiteRT β Android / Qualcomm NPU (.tflite).
Attribution & license
Derivative of Supertone/supertonic-3 (commit
3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108
β OpenRAIL-M (use-based restrictions carry over: no non-consensual impersonation/deepfakes, etc.).
Links
- soniqo.audio β website / use-case explorer.
- speech-swift β Apple Silicon MLX + CoreML runtime.
- speech-core β C++ orchestration library (
TTSInterface). - speech-android β Android LiteRT SDK.
- full CoreML Speech Models collection.
- Downloads last month
- 29
Model tree for aufklarer/Supertonic-3-CoreML-FP16
Base model
Supertone/supertonic-3