cisco-ai/mini-bart-g2p · Hugging Face

Model Summary

mini-bart-g2p is a seq2seq model based on the BART architecture. We spruce down the number of layers and transformer heads in the original BART architecture to ensure that we can reliably train the model for the grapheme to phoneme conversion task.

Intended Uses

The input is expected to contain English words consisting of Latin letters and certain punctuation symbols. The model has been trained to take as input a single word at a time and will return unexpected results when fed multiple words as a single input. The HuggingFace tokenizer provided is configured to normalize the words into lowercase and separates the letters by space under the hood, so while using the model you may provide the words normally without separation between letters.

The model provides output in the form of phonemes along with their corresponding stress numbers. It is also capable of generating phonemes for words that may be hyphenated or have apostrophes present.

How to Use

from transformers import pipeline

pipe = pipeline(task="text2text-generation", model="cisco-ai/mini-bart-g2p")

text = "hello world"
# DO NOT DO ```pipe(text)``` as this will produce unexpected results.

pipe(text.split())
# [{'translation_text': 'HH EH1 L OW0'}, {'translation_text': 'W ER1 L D'}]

text = "co-workers coworkers hunter's hunter"
pipe(text.split())

# [{'translation_text': 'K OW1 W ER1 K ER0 Z'}, {'translation_text': 'K OW1 W ER1 K ER0 Z'}, {'translation_text': 'HH AH1 N T ER0 Z'}, {'translation_text': 'HH AH1 N T ER0'}]

Training

The mini-bart-g2p model was trained on a combination of both the Librispeech Alignments dataset and the CMUDict dataset. The model was trained using the translation training script provided by HuggingFace Transformers repo. The following parameters were specified in the training script to produce the model.

Training script parameters

python run_translation.py \
--model_name_or_path <MODEL DIR> \
--source_lang wrd \
--target_lang phon \
--num_train_epochs 500 \
--train_file <TRAIN SPLIT> \
--validation_file <VAL SPLIT> \
--test_file <TEST SPLIT> \
--num_beams 5 \
--generation_num_beams 5 \
--max_source_length 128 \
--max_target_length 128 \
--overwrite_cache \
--overwrite_output_dir \
--do_train \
--do_eval \
--do_predict \
--evaluation_strategy epoch \
--eval_delay 3 \
--save_strategy epoch \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--learning_rate 5e-4 \
--label_smoothing_factor 0.1 \
--weight_decay 0.00001 \
--adam_beta1 0.9 \
--adam_beta2 0.98 \
--load_best_model_at_end True \
--predict_with_generate True \
--generation_max_length 20 \
--output_dir <OUTPUT DIR> \
--seed 4664427 \
--lr_scheduler_type cosine_with_restarts \
--warmup_steps 120000 \
--optim adafactor \
--group_by_length \
--metric_for_best_model bleu \
--greater_is_better True \
--save_total_limit 10 \
--log_level info \
--logging_steps 500

Limitations

The model has some limitations in it's current form which we list for full transparency.

The mini-bart-g2p model is trained to only work on the English language.
The model does not produce consistent behavior when non-apostrophe punctuation symbols are part of the input word. We recommend stripping the words of all non-essential punctuation symbols before running it through the pipeline.

text = "world world!"
pipe(text.split())
# [{'translation_text': 'W ER1 L D'}, {'translation_text': 'W ER1 L D F'}]

License

The model is licensed under the Apache 2.0 License.