File size: 1,212 Bytes
94a1118
a4ba15e
94a1118
 
 
5b3c8e5
94a1118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Description:

This is a smaller per-trained model on Sinhalese Language using Masked Language Modeling(MLM). And the model is trained on Oscar Sinhala dataset.

# How to Use:
The model can be used directly with a pipeline for masked language modeling:
```python
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> model = AutoModelForMaskedLM.from_pretrained("d42kw01f/Sinhala-RoBERTa")
>>> fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
>>> fill_mask("මම ගෙදර <mask>.")
[{'score': 0.1822454035282135,
  'sequence': 'මම ගෙදර ආව.',
  'token': 701,
  'token_str': ' ආව'},
 {'score': 0.10513380169868469,
  'sequence': 'මම ගෙදර ය.',
  'token': 310,
  'token_str': ' ය'},
 {'score': 0.06417194753885269,
  'sequence': 'මම ගෙදර එක.',
  'token': 328,
  'token_str': ' එක'},
 {'score': 0.05026362091302872,
  'sequence': 'මම ගෙදර ඇත.',
  'token': 330,
  'token_str': ' ඇත'},
 {'score': 0.029960114508867264,
  'sequence': 'මම ගෙදර යනව.',
  'token': 834,
  'token_str': ' යනව'}]
  ```