SupertonicTTS-3 — CoreML FP16 (Apple Neural Engine)

Mixed-precision CoreML export of Supertonic-3's four non-autoregressive flow-matching graphs, tuned for Apple Neural Engine residency on iOS 18+ / macOS 15+. The precision split was chosen by measured end-to-end SNR — flow-matching is an ODE and its trajectory is precision-sensitive, so the graphs that drive it stay FP32:

TextEncoder + VectorEstimator → FP32 (drive the ODE trajectory; the vector-estimator is ANE-ineligible regardless of precision).
Vocoder + DurationPredictor → FP16 (the vocoder is single-shot and the 100%-ANE win).

Controlled shared-noise A/B vs the FP32 reference: mag-STFT SNR 47–51 dB (en/de/ko), max|Δ| 6e-3 — transparent. Full-FP16-everything instead drops to ~24 dB through ODE trajectory divergence (not degradation) — this mixed split keeps measurable fidelity while still putting the vocoder on the ANE. **189 MB** vs ~380 MB for FP32.

This repo is self-contained: the four .mlpackage graphs + the G2P-free tokenizer assets (tts.json, unicode_indexer.json) + voice_styles/. The host runs the flow-matching ODE loop (vector_estimator ×total_step); the graphs contain no control flow.

Parity reference / FP32: aufklarer/Supertonic-3-CoreML.

Other Supertonic-3 formats

Supertonic-3 — CoreML — FP32 parity reference (iOS / ANE).
Supertonic-3 — ONNX (INT8) — server / desktop (ONNX Runtime).
Supertonic-3 — LiteRT — Android / Qualcomm NPU (.tflite).

Attribution & license

Derivative of Supertone/supertonic-3 (commit 3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 — OpenRAIL-M (use-based restrictions carry over: no non-consensual impersonation/deepfakes, etc.).