Nepali Reverse G2P Linguistic v1

Learned reverse-G2P model for Nepali:

phone tokens -> ranked Devanagari spelling candidates

This checkpoint targets the spoken_nepali_linguistic profile, aligned with Wikipedia/Khatiwada-style affricate labels:

च  -> ts
छ  -> tsh
ज  -> dz
झ  -> dzh

It intentionally does not use the separate TTS affricate-rewrite profile (ch/chh/j/jh).

Files

  • checkpoint.pt: PyTorch checkpoint with model config, vocabularies, metrics, and state dict.
  • metrics.json: training/dev/test metrics.
  • corpus_summary.json: training corpus summary.
  • reverse_model.py: model definition.
  • predict_reverse_g2p_model.py: CLI decoder.

Metrics

Selected checkpoint:

best epoch: 19
test loss: 0.2878
test exact: 0.7795
test character error rate: 0.0486
test rows: 957

Usage

python predict_reverse_g2p_model.py \
  --checkpoint checkpoint.pt \
  --phones "aa . dz ax" \
  --beam-size 8 \
  --top-k 5

Limitations

Nepali reverse G2P is one-to-many. Use this model as a candidate generator, then rerank with forward-G2P roundtrip distance, word frequency, and source priority before lexicon promotion.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support