Instructions to use mlboydaisuke/Kokoro-G2P-en-US-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use mlboydaisuke/Kokoro-G2P-en-US-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Kokoro G2P (en-US) — LiteRT (preview)
⚠️ Labeled preview — fixed-length
[1, 96], FP32, CPU. Shared to complete the on-device Kokoro front-end per the LiteRT community direction ③.
A LiteRT (.tflite) conversion of DeepPhonemizer en_us_cmudict_forward (a small non-autoregressive forward Transformer), used as the neural grapheme-to-phoneme (G2P) front-end for on-device Kokoro-82M TTS. It gives Kokoro a phonemizer fallback so arbitrary free text — names, brands, numbers — synthesizes with zero dropped words when the dictionary phonemizer misses.
Files
| File | Precision | Size |
|---|---|---|
dp_g2p_litert.tflite |
fp32 | ~51 MB |
Specs
| Task | Grapheme-to-phoneme (English) |
| Source | DeepPhonemizer en_us_cmudict_forward |
| Input | 1 × 96 character IDs (fixed length 96, in-graph padding mask) |
| Output | per-position phoneme logits → ARPABET / IPA |
| Runtime | CPU (LiteRT CompiledModel API) |
| Verified | Pixel 8a — 12/12 vs the reference G2P, no dropped words |
How it was converted / why CPU
- Stock official converter (
litert_torch), static-shape graph: the dynamic-length export hits the same symbolic-sequence-length wall as the TTS model (Shapes must be 1D sequences of concrete values…) — the C8 / dynamic-shape class. Worked around with a static[1, 96]graph + an in-graph padding mask; converts cleanly and is numerically correct. - CPU-only: the attention's fused-QKV 5-D layout + the mask's
EQUAL/SELECT_V2keep it off the GPU delegate; decomposing the attention to ≤ 4-D would clear that.
Training data
DeepPhonemizer en_us_cmudict_forward is trained on the CMU Pronouncing Dictionary (CMUdict) — ~126k common English words paired with ARPABET pronunciations (a public pronunciation lexicon). It learns the grapheme→phoneme spelling-to-sound mapping only. This LiteRT artifact is a format conversion of the released checkpoint and introduces no additional training data.
PII
No personally identifiable information. CMUdict is a public dictionary of common English word pronunciations (no personal data); none is added during conversion.
Roadmap
- Variable-length + quantized + (ideally) GPU is gated on the dynamic-shape converter work (C8) and a ≤ 4-D attention re-author.
Status
Labeled preview — part of an on-device free-text Kokoro-82M LiteRT pipeline; a clean runnable Android sample is in progress.
License
MIT. Full attribution to DeepPhonemizer and the en_us_cmudict_forward checkpoint.
- Downloads last month
- 19