Sinhala-RoBERTa / README.md
d42kw01f's picture
Update README.md
5b3c8e5
# Description:
This is a smaller per-trained model on Sinhalese Language using Masked Language Modeling(MLM). And the model is trained on Oscar Sinhala dataset.
# How to Use:
The model can be used directly with a pipeline for masked language modeling:
```python
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> model = AutoModelForMaskedLM.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
>>> fill_mask("මම ගෙදර <mask>.")
[{'score': 0.1822454035282135,
'sequence': 'මම ගෙදර ආව.',
'token': 701,
'token_str': ' ආව'},
{'score': 0.10513380169868469,
'sequence': 'මම ගෙදර ය.',
'token': 310,
'token_str': ' ය'},
{'score': 0.06417194753885269,
'sequence': 'මම ගෙදර එක.',
'token': 328,
'token_str': ' එක'},
{'score': 0.05026362091302872,
'sequence': 'මම ගෙදර ඇත.',
'token': 330,
'token_str': ' ඇත'},
{'score': 0.029960114508867264,
'sequence': 'මම ගෙදර යනව.',
'token': 834,
'token_str': ' යනව'}]
```