Edit model card

Model Description

This is a Khmer Language fill masked build on top of pre-trained model of FacebookAI/xlm-roberta-base. This model is fine-tunned with around 26K+ khmer sentences/clauses (80% for training set & 20% for validation set). This model is perform well with Khmer Language ONLY.

Model Usage

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='channudam/khmer-xlm-roberta-base')
>>> unmasker("αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€<mask>αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”")

[
  {
    'score': 0.9788032174110413,
    'token': 41440,
    'token_str': 'αž‘αžΉαž€',
    'sequence': 'αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€αž‘αžΉαž€ αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”'
  },
  {
    'score': 0.012485685758292675,
    'token': 191670,
    'token_str': 'αžŸαŸ’αžšαžΆ',
    'sequence': 'αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€αžŸαŸ’αžšαžΆ αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”'
  },
  {
    'score': 0.0014946138253435493,
    'token': 162483,
    'token_str': 'αž”αžΆαž™',
    'sequence': 'αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€αž”αžΆαž™ αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”'
  },
  {
    'score': 0.001305083278566599,
    'token': 49245,
    'token_str': 'ស៊ី',
    'sequence': 'αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€αžŸαŸŠαžΈ αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”'
  },
  {
    'score': 0.0007108347490429878,
    'token': 51863,
    'token_str': 'αž‘αžΉαž€',
    'sequence': 'αž’αžΆαž€αžΆαžŸαž’αžΆαžαž»αž€αŸ’αžŠαŸ…αžαŸ’αž›αžΆαŸ†αž„ αž…αžΌαžšαž’αŸ’αž“αž€αž•αžΉαž€ αž‘αžΉαž€ αž²αŸ’αž™αž”αžΆαž“αž…αŸ’αžšαžΎαž“αŸ”'
  }
]
Downloads last month
98
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference API
Mask token: <mask>
This model can be loaded on Inference API (serverless).