File size: 2,641 Bytes
5bd0901 6f086d7 b219b1b 7e961da b219b1b 6f086d7 030dfb4 6f086d7 5bd0901 f324996 01c4040 f324996 2652ddf b61adc6 2652ddf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: mit
language: id
base_model: indolem/indobert-base-uncased
widget:
- text: Pelayanan lama, sangat tidak memuaskan.
example_title: Sentiment analysis
datasets:
- indonlp/indonlu
- sepidmnorozy/Indonesian_sentiment
pipeline_tag: text-classification
---
### Model Details
This model is a fine-tuned version of [IndoBERT Base Uncased](https://huggingface.co/indolem/indobert-base-uncased), a BERT model pre-trained on Indonesian text data. It was fine-tuned to perform sentiment analysis on Indonesian comments and reviews.
The model was trained on [indonlu](https://huggingface.co/datasets/indonlp/indonlu) (`SmSA`) and [indonesian_sentiment](https://huggingface.co/datasets/sepidmnorozy/Indonesian_sentiment) datasets.
The model classifies a given Indonesian review text into one of three categories:
* Negative
* Neutral
* Positive
### Training hyperparameters
* train_batch_size: 32
* eval_batch_size: 32
* learning_rate: 1e-4
* optimizer: AdamW with betas=(0.9, 0.999), eps=1e-8, and weight_decay=0.01
* epochs: 3
* learning_rate_scheduler: StepLR with step_size=592, gamma=0.1
### Training Results
The following table shows the training results for the model:
| Epoch | Loss | Accuracy |
|---|---|---|
| 1 | 0.2936 | 0.9310 |
| 2 | 0.1212 | 0.9526 |
| 3 | 0.0795 | 0.9569 |
### How to Use
You can load the model and perform inference as follows:
```
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("taufiqdp/indonesian-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("taufiqdp/indonesian-sentiment")
class_names = ['negatif', 'netral', 'positif']
text = "Pelayanan lama dan tidak ramah"
tokenized_text = tokenizer(text, return_tensors='pt')
with torch.inference_mode():
logits = model(**tokenized_text)['logits']
result = class_names[logits.argmax(dim=1)]
print(result)
```
### Citation
```
@misc{koto2020indolem,
title={IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP},
author={Fajri Koto and Afshin Rahimi and Jey Han Lau and Timothy Baldwin},
year={2020},
eprint={2011.00677},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{purwarianti2019improving,
title={Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector},
author={Ayu Purwarianti and Ida Ayu Putu Ari Crisdayanti},
booktitle={Proceedings of the 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA)},
pages={1--5},
year={2019},
organization={IEEE}
}
``` |