Multilingual G2P ByT5 Small — ONNX

ONNX export of charsiu/g2p_multilingual_byT5_small. Converts written words to IPA transcriptions across 100 languages.

Architecture

ByT5-small (T5ForConditionalGeneration), ~300M params, d_model=1472, 12 layers

ONNX FP32 size

1513 MB (3 graphs: encoder + decoder + decoder_with_past)

ONNX INT8 size

379 MB (75% reduction)

Best latency

~135 ms/word (INT8, threads=1) — 2.78x faster than PyTorch CPU

License

CC BY 4.0

Quick Start

import onnxruntime as ort
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

so = ort.SessionOptions()
so.intra_op_num_threads = 1   # single thread is fastest for this model
so.inter_op_num_threads = 1

model = ORTModelForSeq2SeqLM.from_pretrained(
    "klebster/g2p_multilingual_byT5_small_onnx",
    provider="CPUExecutionProvider",
    session_options=so,
)
tokenizer = AutoTokenizer.from_pretrained("klebster/g2p_multilingual_byT5_small_onnx")

inputs = tokenizer("<eng-us>: hello", padding=True, add_special_tokens=False, return_tensors="pt")
preds = model.generate(**inputs, num_beams=1, max_length=50)
print(tokenizer.decode(preds[0], skip_special_tokens=True))
# Output: ˈhɛɫoʊ

Input format: <language_code>: word (e.g. <fra>: bonjour, <ger>: Straße). See CharsiuG2P for all 100 language codes.

Benchmark Summary

Tested on 15 words across 10+ languages, 30 reps, greedy decoding. Hardware: Intel i9-13900KS, 128 GB DDR5.

Configuration ms/word vs PyTorch CPU
ONNX INT8 + threads=1 ~135 2.78x faster
ONNX INT8 + threads=8 ~196 1.92x faster
ONNX FP32 + threads=8 ~392 0.96x
PyTorch CPU (baseline) ~375 1.00x

For the small model, single-threaded INT8 is significantly faster than multi-threaded, likely due to the sequential nature of the decoder and threading overhead at this model size. This may depend on CPU architecture and overall system specification.

Correctness: ONNX FP32 output is bit-identical to PyTorch on spot checks; no full 100-language evaluation has been run for this model. For a complete evaluation see the tiny model card.

Known Issues

German IPA quality: non-standard dialect

The model does not reliably produce Standard German (Hochdeutsch). Observed issues include use of alveolar flap /ɾ/ where Standard German uses uvular fricative /ʁ/, among other systematic deviations. See CharsiuG2P issue #20.

Spanish dialect dictionaries: spa and spa-me are identical

The spa (European Spanish) and spa-me (Mexican Spanish) dictionaries are identical in the upstream CharsiuG2P repository. They should differ in the /s/–/θ/ distinction (ceceo/seseo). See CharsiuG2P issue #15.

Links

Citation

@misc{zhu2022byt5modelmassivelymultilingual,
      title={ByT5 model for massively multilingual grapheme-to-phoneme conversion},
      author={Jian Zhu and Cong Zhang and David Jurgens},
      year={2022},
      eprint={2204.03067},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2204.03067},
}

@misc{noel2026g2pmultilingualbyT5smallonnx,
      title={Multilingual G2P ByT5 Small — ONNX export},
      author={Kleber Noel},
      year={2026},
      month={apr},
      url={https://huggingface.co/klebster/g2p_multilingual_byT5_small_onnx},
}
Downloads last month
92
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for klebster/g2p_multilingual_byT5_small_onnx

Quantized
(1)
this model

Paper for klebster/g2p_multilingual_byT5_small_onnx

Evaluation results

  • PER (ONNX INT8, greedy) on CharsiuG2P Test Set (100 languages, 500 words each)
    self-reported
    TBD
  • WER (ONNX INT8, greedy) on CharsiuG2P Test Set (100 languages, 500 words each)
    self-reported
    TBD