Multilingual G2P ByT5 Small — ONNX
ONNX export of charsiu/g2p_multilingual_byT5_small. Converts written words to IPA transcriptions across 100 languages.
Architecture
ByT5-small (T5ForConditionalGeneration), ~300M params, d_model=1472, 12 layers
ONNX FP32 size
1513 MB (3 graphs: encoder + decoder + decoder_with_past)
ONNX INT8 size
379 MB (75% reduction)
Best latency
~135 ms/word (INT8, threads=1) — 2.78x faster than PyTorch CPU
License
Quick Start
import onnxruntime as ort
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer
so = ort.SessionOptions()
so.intra_op_num_threads = 1 # single thread is fastest for this model
so.inter_op_num_threads = 1
model = ORTModelForSeq2SeqLM.from_pretrained(
"klebster/g2p_multilingual_byT5_small_onnx",
provider="CPUExecutionProvider",
session_options=so,
)
tokenizer = AutoTokenizer.from_pretrained("klebster/g2p_multilingual_byT5_small_onnx")
inputs = tokenizer("<eng-us>: hello", padding=True, add_special_tokens=False, return_tensors="pt")
preds = model.generate(**inputs, num_beams=1, max_length=50)
print(tokenizer.decode(preds[0], skip_special_tokens=True))
# Output: ˈhɛɫoʊ
Input format: <language_code>: word (e.g. <fra>: bonjour, <ger>: Straße). See CharsiuG2P for all 100 language codes.
Benchmark Summary
Tested on 15 words across 10+ languages, 30 reps, greedy decoding. Hardware: Intel i9-13900KS, 128 GB DDR5.
| Configuration | ms/word | vs PyTorch CPU |
|---|---|---|
| ONNX INT8 + threads=1 | ~135 | 2.78x faster |
| ONNX INT8 + threads=8 | ~196 | 1.92x faster |
| ONNX FP32 + threads=8 | ~392 | 0.96x |
| PyTorch CPU (baseline) | ~375 | 1.00x |
For the small model, single-threaded INT8 is significantly faster than multi-threaded, likely due to the sequential nature of the decoder and threading overhead at this model size. This may depend on CPU architecture and overall system specification.
Correctness: ONNX FP32 output is bit-identical to PyTorch on spot checks; no full 100-language evaluation has been run for this model. For a complete evaluation see the tiny model card.
Known Issues
German IPA quality: non-standard dialect
The model does not reliably produce Standard German (Hochdeutsch). Observed issues include use of alveolar flap /ɾ/ where Standard German uses uvular fricative /ʁ/, among other systematic deviations. See CharsiuG2P issue #20.
Spanish dialect dictionaries: spa and spa-me are identical
The spa (European Spanish) and spa-me (Mexican Spanish) dictionaries are identical in the
upstream CharsiuG2P repository. They should differ in the /s/–/θ/ distinction (ceceo/seseo).
See CharsiuG2P issue #15.
Links
- Paper: Zhu et al. (2022), Interspeech
- Original repo: CharsiuG2P
- Base model: charsiu/g2p_multilingual_byT5_small
- Tiny ONNX sibling: klebster/g2p_multilingual_byT5_tiny_onnx
- ONNX export by: klebster for phone-similarity
Citation
@misc{zhu2022byt5modelmassivelymultilingual,
title={ByT5 model for massively multilingual grapheme-to-phoneme conversion},
author={Jian Zhu and Cong Zhang and David Jurgens},
year={2022},
eprint={2204.03067},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2204.03067},
}
@misc{noel2026g2pmultilingualbyT5smallonnx,
title={Multilingual G2P ByT5 Small — ONNX export},
author={Kleber Noel},
year={2026},
month={apr},
url={https://huggingface.co/klebster/g2p_multilingual_byT5_small_onnx},
}
- Downloads last month
- 92
Model tree for klebster/g2p_multilingual_byT5_small_onnx
Base model
charsiu/g2p_multilingual_byT5_smallPaper for klebster/g2p_multilingual_byT5_small_onnx
Evaluation results
- PER (ONNX INT8, greedy) on CharsiuG2P Test Set (100 languages, 500 words each)self-reportedTBD
- WER (ONNX INT8, greedy) on CharsiuG2P Test Set (100 languages, 500 words each)self-reportedTBD