File size: 1,962 Bytes

b253a9b
 
 
 
 
 
 
 
 
 
 
 
 
622ac27
 
b253a9b
99dc09a
196ccef
 
ed9e89a
417c1c3
0517be6
25cc44e
0517be6
25cc44e
434d94d
 
895a7a2
36c1b8d
0e92c2d
36c1b8d
0c13825
0517be6
0c13825
0517be6
0c13825
25cc44e
ed9e89a
 
0517be6
622ac27
0517be6
481f631
 
0517be6
 
158faf2
0517be6

---
language:
  - fr
thumbnail: "url to a thumbnail used in social sharing"
tags:
- text-generation
datasets:
- Marxav/frpron
metrics:
- loss/eval
- perplexity
widget:
- text: "bonjour:"
- text: "salut, comment ça va:"
- text: "Louis XIII:"
- text: "anticonstitutionnellement:"
- text: "les animaux:"
inference:
  parameters:
    temperature: 0.01
    return_full_text: True
---
# Fr-word to phonemic pronunciation

This model aims at predicting the syllabized phonemic pronunciation of the French words.

The generated pronunciation is:
* A text string made of International Phonetic Alphabet (IPA) characters;
* Phonemic (i.e. remains at the phoneme-level, not deeper);
* Syllabized (i.e. characters '.' and '‿' are used to identify syllabes).

Such pronunciation is used in the [French Wiktionary](https://fr.wiktionary.org/) in the {{pron|...|fr}} tag.

To use this model, simply give an input containing the word that you want to translate followed by ":", for example: "bonjour:". It will generate its predicted pronunciation, for example "bɔ̃.ʒuʁ".

This model remains experimental. Additional finetuning is needed for: 
* [Homographs with different pronunciations](https://fr.wiktionary.org/wiki/Catégorie:Homographes_non_homophones_en_français), 
* [French liaisons](https://en.wikipedia.org/wiki/Liaison_(French)), 
* [Roman numerals](https://en.wikipedia.org/wiki/Roman_numerals).

The input length is currently limited to a maximum of 60 letters.

This work is derived from the [OTEANN paper](https://aclanthology.org/2021.sigtyp-1.1/) and [code](https://github.com/marxav/oteann3), which used [minGTP](https://github.com/karpathy/minGPT).

## More information on the model, dataset, hardware, environmental consideration:

### **The training data**
The dataset used for training this models comes from data of the [French Wiktionary](https://fr.wiktionary.org/).
	
### **The model**
The model is build on [gpt2](https://huggingface.co/gpt2)