w11wo
/

indonesian-roberta-base-sentiment-classifier

Text Classification

indonesian-roberta-base-sentiment-classifier

Inference Endpoints

Model card Files Files and versions Community

indonesian-roberta-base-sentiment-classifier / README.md

w11wo's picture

Create README.md

2e41f69 almost 3 years ago

|

raw history blame

No virus

2.84 kB

	---
	language: id
	tags:
	- indonesian-roberta-base-sentiment-classifier
	license: mit
	datasets:
	- indonlu
	widget:
	- text: "Jangan sampai saya telpon bos saya ya!"
	---

	## Indonesian RoBERTa Base Sentiment Classifier

	Indonesian RoBERTa Base Sentiment Classifier is a sentiment-text-classification model based on the [RoBERTa](https://arxiv.org/abs/1907.11692) model. The model was originally the pre-trained [Indonesian RoBERTa Base](https://hf.co/flax-community/indonesian-roberta-base) model, which is then fine-tuned on [`indonlu`](https://hf.co/datasets/indonlu)'s `SmSA` dataset consisting of Indonesian comments and reviews.

	After training, the model achieved an evaluation accuracy of 93.88% and F1-macro of 91.57%. On the benchmark test set, the model achieved an accuracy of 90.00% and F1-macro of 85.97%.

	Hugging Face's `Trainer` class from the [Transformers](https://huggingface.co/transformers) library was used to train the model. PyTorch was used as the backend framework during training, but the model remains compatible with other frameworks nonetheless.

	## Model

	\| Model \| #params \| Arch. \| Training/Validation data (text) \|
	\| ---------------------------------------------- \| ------- \| ------------ \| ------------------------------- \|
	\| `indonesian-roberta-base-sentiment-classifier` \| 124M \| RoBERTa Base \| `SmSA` \|

	## Evaluation Results

	The model was trained for 5 epochs and the best model was loaded at the end.

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \| F1 \| Precision \| Recall \|
	\| ----- \| ------------- \| --------------- \| -------- \| -------- \| --------- \| -------- \|
	\| 1 \| 0.346100 \| 0.263456 \| 0.915079 \| 0.888680 \| 0.877023 \| 0.903502 \|
	\| 2 \| 0.175200 \| 0.215166 \| 0.930952 \| 0.908246 \| 0.918557 \| 0.898842 \|
	\| 3 \| 0.111700 \| 0.227525 \| 0.932540 \| 0.901823 \| 0.916049 \| 0.891263 \|
	\| 4 \| 0.071800 \| 0.244867 \| 0.938889 \| 0.915714 \| 0.923105 \| 0.909921 \|
	\| 5 \| 0.055000 \| 0.262004 \| 0.935714 \| 0.906755 \| 0.918607 \| 0.898044 \|

	## How to Use

	### As Text Classifier

	```python
	from transformers import pipeline

	pretrained_name = "w11wo/indonesian-roberta-base-sentiment-classifier"

	nlp = pipeline(
	"sentiment-analysis",
	model=pretrained_name,
	tokenizer=pretrained_name
	)

	nlp("Jangan sampai saya telpon bos saya ya!")
	```

	## Disclaimer

	Do consider the biases which come from both the pre-trained RoBERTa model and the `SmSA` dataset that may be carried over into the results of this model.

	## Author

	Indonesian RoBERTa Base Sentiment Classifier was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Google Colaboratory using their free GPU access.