Finetuend xlm-roberta-base model on Thai sequence and token classification datasets


Finetuned XLM Roberta BASE model on Thai sequence and token classification datasets The script and documentation can be found at this repository.


Model description


We use the pretrained cross-lingual RoBERTa model as proposed by [Conneau et al., 2020]. We download the pretrained PyTorch model via HuggingFace's Model Hub (https://huggingface.co/xlm-roberta-base)

Intended uses & limitations


You can use the finetuned models for multiclass/multilabel text classification and token classification task.


Multiclass text classification

  • wisesight_sentiment

    4-class text classification task (positive, neutral, negative, and question) based on social media posts and tweets.

  • wongnai_reivews

    Users' review rating classification task (scale is ranging from 1 to 5)

  • generated_reviews_enth : (review_star as label)

    Generated users' review rating classification task (scale is ranging from 1 to 5).

Multilabel text classification

  • prachathai67k

    Thai topic classification with 12 labels based on news article corpus from prachathai.com. The detail is described in this page.

Token classification

  • thainer

    Named-entity recognition tagging with 13 named-entities as descibed in this page.

  • lst20 : NER NER and POS tagging

    Named-entity recognition tagging with 10 named-entities and Part-of-Speech tagging with 16 tags as descibed in this page.


How to use


The example notebook demonstrating how to use finetuned model for inference can be found at this Colab notebook


BibTeX entry and citation info

@misc{lowphansirikul2021wangchanberta,
      title={WangchanBERTa: Pretraining transformer-based Thai Language Models}, 
      author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
      year={2021},
      eprint={2101.09635},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
61
Hosted inference API
Fill-Mask
Mask token: <mask>
This model can be loaded on the Inference API on-demand.