# FongBERT

FongBERT is a BERT model trained on more than 50.000 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
Below are some examples of missing word prediction.


from transformers import AutoTokenizer, AutoModelForMaskedLM
from transformers import pipeline
  
tokenizer = AutoTokenizer.from_pretrained("Gilles/FongBERT")

model = AutoModelForMaskedLM.from_pretrained("Gilles/FongBERT")


fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)


#### Example 1

**Sentence 1**: wa wazɔ xa mi . **Translation**: come to work with me.

**Masked Sentence**: wa wazɔ xa <"mask"> . **Translation**: come to work with <"mask">.

fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')

[{'score': 0.9988399147987366,
  'sequence': 'wa wazɔ xa mi',
  'token': 391,
  'token_str': ' mi'},
 {'score': 0.00041466866969130933,
  'sequence': 'wa wazɔ xa wɛ' 
...........]


#### Example 2

**Sentence 2**: un yi wan nu we ɖesu . **Translation**: I love you so much.

**Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.

[{'score': 0.8948522210121155,
  'sequence': 'un yi wan nu we ɖesu',
  'token': 702,
  'token_str': ' wan'},
 {'score': 0.06282926350831985,
  'sequence': 'un yi ɖɔ nu we ɖesu',
  ...........]


#### Example 3

**Sentence 3**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé . **Translation**: I went to my boyfriend for a while.

**Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.

  [{'score': 0.2686346471309662,
  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú é ɖé',
  'token': 278,
  'token_str': ' é'},
 {'score': 0.1764318197965622,
  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
  'token': 1205,
    ...........]