Nepali Reverse G2P Linguistic v1
Learned reverse-G2P model for Nepali:
phone tokens -> ranked Devanagari spelling candidates
This checkpoint targets the spoken_nepali_linguistic profile, aligned with
Wikipedia/Khatiwada-style affricate labels:
च -> ts
छ -> tsh
ज -> dz
झ -> dzh
It intentionally does not use the separate TTS affricate-rewrite profile
(ch/chh/j/jh).
Files
checkpoint.pt: PyTorch checkpoint with model config, vocabularies, metrics, and state dict.metrics.json: training/dev/test metrics.corpus_summary.json: training corpus summary.reverse_model.py: model definition.predict_reverse_g2p_model.py: CLI decoder.
Metrics
Selected checkpoint:
best epoch: 19
test loss: 0.2878
test exact: 0.7795
test character error rate: 0.0486
test rows: 957
Usage
python predict_reverse_g2p_model.py \
--checkpoint checkpoint.pt \
--phones "aa . dz ax" \
--beam-size 8 \
--top-k 5
Limitations
Nepali reverse G2P is one-to-many. Use this model as a candidate generator, then rerank with forward-G2P roundtrip distance, word frequency, and source priority before lexicon promotion.