File size: 1,212 Bytes
94a1118 a4ba15e 94a1118 5b3c8e5 94a1118 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Description:
This is a smaller per-trained model on Sinhalese Language using Masked Language Modeling(MLM). And the model is trained on Oscar Sinhala dataset.
# How to Use:
The model can be used directly with a pipeline for masked language modeling:
```python
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> model = AutoModelForMaskedLM.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
>>> fill_mask("මම ගෙදර <mask>.")
[{'score': 0.1822454035282135,
'sequence': 'මම ගෙදර ආව.',
'token': 701,
'token_str': ' ආව'},
{'score': 0.10513380169868469,
'sequence': 'මම ගෙදර ය.',
'token': 310,
'token_str': ' ය'},
{'score': 0.06417194753885269,
'sequence': 'මම ගෙදර එක.',
'token': 328,
'token_str': ' එක'},
{'score': 0.05026362091302872,
'sequence': 'මම ගෙදර ඇත.',
'token': 330,
'token_str': ' ඇත'},
{'score': 0.029960114508867264,
'sequence': 'මම ගෙදර යනව.',
'token': 834,
'token_str': ' යනව'}]
``` |