FongBERT / README.md
Gilles's picture
Update README.md
f8bec93
|
raw
history blame
2.09 kB

FongBERT

FongBERT is a BERT model trained on more than 50.000 sentences in Fon. The data are compiled from JW300 and other additional data I scraped from the JW website. It is the first pretrained model to leverage transfer learning for downtream tasks for Fon. Below are some examples of missing word prediction.

from transformers import AutoTokenizer, AutoModelForMaskedLM from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Gilles/FongBERT")

model = AutoModelForMaskedLM.from_pretrained("Gilles/FongBERT")

fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)

Example 1

Sentence 1: wa wazɔ xa mi . Translation: come to work with me.

Masked Sentence: wa wazɔ xa <"mask"> . Translation: come to work with <"mask">.

fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')

[{'score': 0.9988399147987366, 'sequence': 'wa wazɔ xa mi', 'token': 391, 'token_str': ' mi'}, {'score': 0.00041466866969130933, 'sequence': 'wa wazɔ xa wɛ' ...........]

Example 2

Sentence 2: un yi wan nu we ɖesu . Translation: I love you so much.

Masked Sentence: un yi <"mask"> nu we ɖesu . Translation: I <"mask"> you so much.

[{'score': 0.8948522210121155, 'sequence': 'un yi wan nu we ɖesu', 'token': 702, 'token_str': ' wan'}, {'score': 0.06282926350831985, 'sequence': 'un yi ɖɔ nu we ɖesu', ...........]

Example 3

Sentence 3: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé . Translation: I went to my boyfriend for a while.

Masked Sentence: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . Translation: I went to my boyfriend for a <"mask">.

[{'score': 0.2686346471309662, 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú é ɖé', 'token': 278, 'token_str': ' é'}, {'score': 0.1764318197965622, 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé', 'token': 1205, ...........]