--- language: - fr thumbnail: "url to a thumbnail used in social sharing" tags: - text-generation datasets: - Marxav/frpron metrics: - loss/eval - perplexity widget: - text: "bonjour:" - text: "salut, comment ça va:" - text: "Louis XIII:" - text: "anticonstitutionnellement:" - text: "les animaux:" inference: parameters: temperature: 0.01 return_full_text: True --- # Fr-word to phonemic pronunciation This model aims at predicting the syllabized phonemic pronunciation of the French words. The generated pronunciation is: * A text string made of International Phonetic Alphabet (IPA) characters; * Phonemic (i.e. remains at the phoneme-level, not deeper); * Syllabized (i.e. characters '.' and '‿' are used to identify syllabes). Such pronunciation is used in the [French Wiktionary](https://fr.wiktionary.org/) in the {{pron|...|fr}} tag. To use this model, simply give an input containing the word that you want to translate followed by ":", for example: "bonjour:". It will generate its predicted pronunciation, for example "bɔ̃.ʒuʁ". This model remains experimental. Additional finetuning is needed for: * [Homographs with different pronunciations](https://fr.wiktionary.org/wiki/Catégorie:Homographes_non_homophones_en_français), * [French liaisons](https://en.wikipedia.org/wiki/Liaison_(French)), * [Roman numerals](https://en.wikipedia.org/wiki/Roman_numerals). The input length is currently limited to a maximum of 60 letters. This work is derived from the [OTEANN paper](https://aclanthology.org/2021.sigtyp-1.1/) and [code](https://github.com/marxav/oteann3), which used [minGTP](https://github.com/karpathy/minGPT). ## More information on the model, dataset, hardware, environmental consideration: ### **The training data** The dataset used for training this models comes from data of the [French Wiktionary](https://fr.wiktionary.org/). ### **The model** The model is build on [gpt2](https://huggingface.co/gpt2)