FongBERT
FongBERT is a BERT model trained on 68.363 sentences in Fon. The data are compiled from JW300 and other additional data I scraped from the JW website. It is the first pretrained model to leverage transfer learning for downtream tasks for Fon. Below are some examples of missing word prediction.
from transformers import AutoTokenizer, AutoModelForMaskedLM from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Gilles/FongBERT")
model = AutoModelForMaskedLM.from_pretrained("Gilles/FongBERT")
fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)
Example 1
Sentence 1: un tuùn ɖɔ un jló na wazɔ̌ nú we . Translation: I know I have to work for you.
Masked Sentence: un tuùn ɖɔ un jló na wazɔ̌ <"mask"> we . Translation: I know I have to work <"mask"> you.
fill(f'un tuùn ɖɔ un jló na wazɔ̌ {fill.tokenizer.mask_token} we')
[{'score': 0.994536280632019, 'sequence': 'un tuùn ɖɔ un jló na wazɔ̌ nú we', 'token': 312, 'token_str': ' nú'}, {'score': 0.0015309195732697845, 'sequence': 'un tuùn ɖɔ un jló na wazɔ̌nu we', ...........]
Example 2
Sentence 2: un yi wan nu we ɖesu . Translation: I love you so much.
Masked Sentence: un yi <"mask"> nu we ɖesu . Translation: I <"mask"> you so much.
[{'score': 0.31483960151672363, 'sequence': 'un yi wan nu we ɖesu', 'token': 639, 'token_str': ' wan'}, {'score': 0.20940221846103668, 'sequence': 'un yi ba nu we ɖesu', ...........]
Example 3
Sentence 3: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé . Translation: I went to my boyfriend for a while.
Masked Sentence: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . Translation: I went to my boyfriend for a <"mask">.
[{'score': 0.934298574924469, 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé', 'token': 1102, 'token_str': ' táan'}, {'score': 0.03750855475664139, 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú ganxixo ɖé', ...........]
- Downloads last month
- 24