File size: 1,962 Bytes
b253a9b 622ac27 b253a9b 99dc09a 196ccef ed9e89a 417c1c3 0517be6 25cc44e 0517be6 25cc44e 434d94d 895a7a2 36c1b8d 0e92c2d 36c1b8d 0c13825 0517be6 0c13825 0517be6 0c13825 25cc44e ed9e89a 0517be6 622ac27 0517be6 481f631 0517be6 158faf2 0517be6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
language:
- fr
thumbnail: "url to a thumbnail used in social sharing"
tags:
- text-generation
datasets:
- Marxav/frpron
metrics:
- loss/eval
- perplexity
widget:
- text: "bonjour:"
- text: "salut, comment ça va:"
- text: "Louis XIII:"
- text: "anticonstitutionnellement:"
- text: "les animaux:"
inference:
parameters:
temperature: 0.01
return_full_text: True
---
# Fr-word to phonemic pronunciation
This model aims at predicting the syllabized phonemic pronunciation of the French words.
The generated pronunciation is:
* A text string made of International Phonetic Alphabet (IPA) characters;
* Phonemic (i.e. remains at the phoneme-level, not deeper);
* Syllabized (i.e. characters '.' and '‿' are used to identify syllabes).
Such pronunciation is used in the [French Wiktionary](https://fr.wiktionary.org/) in the {{pron|...|fr}} tag.
To use this model, simply give an input containing the word that you want to translate followed by ":", for example: "bonjour:". It will generate its predicted pronunciation, for example "bɔ̃.ʒuʁ".
This model remains experimental. Additional finetuning is needed for:
* [Homographs with different pronunciations](https://fr.wiktionary.org/wiki/Catégorie:Homographes_non_homophones_en_français),
* [French liaisons](https://en.wikipedia.org/wiki/Liaison_(French)),
* [Roman numerals](https://en.wikipedia.org/wiki/Roman_numerals).
The input length is currently limited to a maximum of 60 letters.
This work is derived from the [OTEANN paper](https://aclanthology.org/2021.sigtyp-1.1/) and [code](https://github.com/marxav/oteann3), which used [minGTP](https://github.com/karpathy/minGPT).
## More information on the model, dataset, hardware, environmental consideration:
### **The training data**
The dataset used for training this models comes from data of the [French Wiktionary](https://fr.wiktionary.org/).
### **The model**
The model is build on [gpt2](https://huggingface.co/gpt2) |