SupertonicTTS-3 β€” CoreML FP16 (Apple Neural Engine)

Mixed-precision CoreML export of Supertonic-3's four non-autoregressive flow-matching graphs, tuned for Apple Neural Engine residency on iOS 18+ / macOS 15+. The precision split was chosen by measured end-to-end SNR β€” flow-matching is an ODE and its trajectory is precision-sensitive, so the graphs that drive it stay FP32:

  • TextEncoder + VectorEstimator β†’ FP32 (drive the ODE trajectory; the vector-estimator is ANE-ineligible regardless of precision).
  • Vocoder + DurationPredictor β†’ FP16 (the vocoder is single-shot and the 100%-ANE win).

Controlled shared-noise A/B vs the FP32 reference: mag-STFT SNR 47–51 dB (en/de/ko), max|Ξ”| 6e-3 β€” transparent. Full-FP16-everything instead drops to ~24 dB through ODE trajectory divergence (not degradation) β€” this mixed split keeps measurable fidelity while still putting the vocoder on the ANE. **189 MB** vs ~380 MB for FP32.

This repo is self-contained: the four .mlpackage graphs + the G2P-free tokenizer assets (tts.json, unicode_indexer.json) + voice_styles/. The host runs the flow-matching ODE loop (vector_estimator Γ—total_step); the graphs contain no control flow.

Parity reference / FP32: aufklarer/Supertonic-3-CoreML.

Other Supertonic-3 formats

Attribution & license

Derivative of Supertone/supertonic-3 (commit 3cadd1ee6394adea1bd021217a0e650ede09a323), Supertone Inc., arXiv:2503.23108 β€” OpenRAIL-M (use-based restrictions carry over: no non-consensual impersonation/deepfakes, etc.).

Links

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/Supertonic-3-CoreML-FP16

Quantized
(8)
this model

Collection including aufklarer/Supertonic-3-CoreML-FP16

Paper for aufklarer/Supertonic-3-CoreML-FP16