frpron / README.md
Marxav's picture
return_full_text : False -> True
417c1c3
---
language:
- fr
thumbnail: "url to a thumbnail used in social sharing"
tags:
- text-generation
datasets:
- Marxav/frpron
metrics:
- loss/eval
- perplexity
widget:
- text: "bonjour:"
- text: "salut, comment ça va:"
- text: "Louis XIII:"
- text: "anticonstitutionnellement:"
- text: "les animaux:"
inference:
parameters:
temperature: 0.01
return_full_text: True
---
# Fr-word to phonemic pronunciation
This model aims at predicting the syllabized phonemic pronunciation of the French words.
The generated pronunciation is:
* A text string made of International Phonetic Alphabet (IPA) characters;
* Phonemic (i.e. remains at the phoneme-level, not deeper);
* Syllabized (i.e. characters '.' and '‿' are used to identify syllabes).
Such pronunciation is used in the [French Wiktionary](https://fr.wiktionary.org/) in the {{pron|...|fr}} tag.
To use this model, simply give an input containing the word that you want to translate followed by ":", for example: "bonjour:". It will generate its predicted pronunciation, for example "bɔ̃.ʒuʁ".
This model remains experimental. Additional finetuning is needed for:
* [Homographs with different pronunciations](https://fr.wiktionary.org/wiki/Catégorie:Homographes_non_homophones_en_français),
* [French liaisons](https://en.wikipedia.org/wiki/Liaison_(French)),
* [Roman numerals](https://en.wikipedia.org/wiki/Roman_numerals).
The input length is currently limited to a maximum of 60 letters.
This work is derived from the [OTEANN paper](https://aclanthology.org/2021.sigtyp-1.1/) and [code](https://github.com/marxav/oteann3), which used [minGTP](https://github.com/karpathy/minGPT).
## More information on the model, dataset, hardware, environmental consideration:
### **The training data**
The dataset used for training this models comes from data of the [French Wiktionary](https://fr.wiktionary.org/).
### **The model**
The model is build on [gpt2](https://huggingface.co/gpt2)