--- license: apache-2.0 language: - en tags: - g2p - cisco - Grapheme-to-Phoneme pipeline_tag: text2text-generation --- ## Model Summary `mini-bart-g2p` is a seq2seq model based on the [BART architecture](https://arxiv.org/abs/1910.13461). We spruce down the number of layers and transformer heads in the original BART architecture to ensure that we can reliably train the model for the grapheme to phoneme conversion task. ## Intended Uses The input is expected to contain English words consisting of Latin letters and certain punctuation symbols. The model has been trained to take as input a single word at a time and _will return unexpected results when fed multiple words as a single input_. The [HuggingFace tokenizer](https://huggingface.co/cisco-ai/mini-bart-g2p/blob/main/tokenizer.json) provided is configured to normalize the words into lowercase and separates the letters by space under the hood, so while using the model you may provide the words normally without separation between letters. The model provides output in the form of phonemes along with their corresponding stress numbers. It is also capable of generating phonemes for words that may be hyphenated or have apostrophes present. ## How to Use ```python from transformers import pipeline pipe = pipeline(task="text2text-generation", model="cisco-ai/mini-bart-g2p") text = "hello world" # DO NOT DO ```pipe(text)``` as this will produce unexpected results. pipe(text.split()) # [{'translation_text': 'HH EH1 L OW0'}, {'translation_text': 'W ER1 L D'}] text = "co-workers coworkers hunter's hunter" pipe(text.split()) # [{'translation_text': 'K OW1 W ER1 K ER0 Z'}, {'translation_text': 'K OW1 W ER1 K ER0 Z'}, {'translation_text': 'HH AH1 N T ER0 Z'}, {'translation_text': 'HH AH1 N T ER0'}] ``` ## Training The `mini-bart-g2p` model was trained on a combination of both the [Librispeech Alignments dataset](https://zenodo.org/records/2619474#.YuCdaC8r1ZF) and the [CMUDict dataset](https://github.com/cmusphinx/cmudict). The model was trained using the [translation training script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/translation/run_translation.py) provided by HuggingFace Transformers repo. The following parameters were specified in the training script to produce the model.
Training script parameters ```bash python run_translation.py \ --model_name_or_path \ --source_lang wrd \ --target_lang phon \ --num_train_epochs 500 \ --train_file \ --validation_file \ --test_file \ --num_beams 5 \ --generation_num_beams 5 \ --max_source_length 128 \ --max_target_length 128 \ --overwrite_cache \ --overwrite_output_dir \ --do_train \ --do_eval \ --do_predict \ --evaluation_strategy epoch \ --eval_delay 3 \ --save_strategy epoch \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 16 \ --learning_rate 5e-4 \ --label_smoothing_factor 0.1 \ --weight_decay 0.00001 \ --adam_beta1 0.9 \ --adam_beta2 0.98 \ --load_best_model_at_end True \ --predict_with_generate True \ --generation_max_length 20 \ --output_dir \ --seed 4664427 \ --lr_scheduler_type cosine_with_restarts \ --warmup_steps 120000 \ --optim adafactor \ --group_by_length \ --metric_for_best_model bleu \ --greater_is_better True \ --save_total_limit 10 \ --log_level info \ --logging_steps 500 ```
## Limitations The model has some limitations in it's current form which we list for full transparency. - The `mini-bart-g2p` model is trained to only work on the English language. - The model does not produce consistent behavior when non-apostrophe punctuation symbols are part of the input word. **We recommend stripping the words of all non-essential punctuation symbols before running it through the pipeline.** ```python text = "world world!" pipe(text.split()) # [{'translation_text': 'W ER1 L D'}, {'translation_text': 'W ER1 L D F'}] ``` ### License The model is licensed under the [Apache 2.0 License](https://huggingface.co/cisco-ai/mini-bart-g2p/blob/main/LICENSE).