Nepali Reverse G2P Linguistic v1

Learned reverse-G2P model for Nepali:

phone tokens -> ranked Devanagari spelling candidates

This checkpoint targets the spoken_nepali_linguistic profile, aligned with Wikipedia/Khatiwada-style affricate labels:

च  -> ts
छ  -> tsh
ज  -> dz
झ  -> dzh

It intentionally does not use the separate TTS affricate-rewrite profile (ch/chh/j/jh).

Files

checkpoint.pt: PyTorch checkpoint with model config, vocabularies, metrics, and state dict.
metrics.json: training/dev/test metrics.
corpus_summary.json: training corpus summary.
reverse_model.py: model definition.
predict_reverse_g2p_model.py: CLI decoder.

Metrics

Selected checkpoint:

best epoch: 19
test loss: 0.2878
test exact: 0.7795
test character error rate: 0.0486
test rows: 957

Usage

python predict_reverse_g2p_model.py \
  --checkpoint checkpoint.pt \
  --phones "aa . dz ax" \
  --beam-size 8 \
  --top-k 5

Limitations

Nepali reverse G2P is one-to-many. Use this model as a candidate generator, then rerank with forward-G2P roundtrip distance, word frequency, and source priority before lexicon promotion.

Downloads last month: -; Downloads are not tracked for this model. How to track