Russian G2P token classification model
This is a non-autoregressive model for Russian grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in IPA format. Initial data was built using Wiktionary json from https://kaikki.org/dictionary/Russian/index.html
Intended uses & limitations
The input is expected to consist of cyrillic letters separated by space. Real space should be replaced to underscore(_). Note that the model was trained on single words and some short phrases. Though it can accept longer phrases its accuracy may degrade on them.
How to use
Install NeMo.
Download ru_g2p.nemo (this model)
git lfs install
git clone https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large
Run
python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \
pretrained_model=ru_g2p_ipa_bert_large/ru_g2p.nemo \
inference.from_file=input.txt \
inference.out_file=output.txt \
model.max_sequence_len=512 \
inference.batch_size=128 \
lang=ru
Example of input file:
и с х о д
т р а н с н е п т у н о в ы х
т е л я т н и к о в с к о е
ц а р с к о г о
к р о с х о ф
г а н с - ю р г е н
д а р д а н е л л
Example of output file:
ɪ s x 'o t и с х о д ɪ s x 'o t ɪ s x 'o t PLAIN PLAIN PLAIN PLAIN PLAIN
t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x т р а н с н е п т у н о в ы х t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x t r a nʲ sʲ nʲ ɪ p t 'u n ə v ɨ x PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə т е л я т н и к о в с к о е tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə tʲ ɪ lʲ 'æ tʲ nʲ ɪ k ə f s k ə jə PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
t~s 'a r s k ə v ə ц а р с к о г о t~s 'a r s k ə v ə t~s 'a r s k ə v ə PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
k r ɐ s x 'o f к р о с х о ф k r ɐ s x 'o f k r ɐ s x 'o f PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
ɡ a n s 'ju r ɡʲ ɪ n г а н с - ю р г е н ɡ a n s _ 'ju r ɡʲ ɪ n ɡ a n s _ 'ju r ɡʲ ɪ n PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
d ə r d ɐ n 'ɛ ɫ д а р д а н е л л d ə r d ɐ n 'ɛ ɫ <DELETE> d ə r d ɐ n 'ɛ ɫ <DELETE> PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
Note that the correct output tags are in the third column, input is in the second column.
Tags correspond to input letters in a one-to-one fashion. If you remove <DELETE>
tag, +
, ~
, and spaces, you should get IPA-like transcription.
The model does not predict secondary stress. The primary stress is put directly before the stressed vowel. In some cases stress can be missing.
How to use for TTS
See example of inference pipeline for G2P + FastPitch + HifiGAN in this notebook.
- Downloads last month
- 34