Indonesia DistilledBERT Sentiment Classification

Overview

This model is a fine-tuned version of DistilledBERT for sentiment classification in Indonesian language. It's designed to analyze and categorize the sentiment of Indonesian text into positive, negative, or neutral categories.

Model Details

Model Type: DistilledBERT
Language: Indonesian
Task: Sentiment Classification
Base Model: distilbert-base-uncased

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

pretrained= "fathurfrs/indonesia-distilledbert-sentiment-classification"

model = AutoModelForSequenceClassification.from_pretrained(pretrained)
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

sentiment_analysis = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)


label2id = {"LABEL_0": "negative", "LABEL_1": "neutral", "LABEL_2": "positive"}
text = "Saya sangat senang dengan pelayanan di restoran ini."
sentiment = label2id[sentiment_analysis(text)[0]['label']]
score = sentiment_analysis(text)[0]['score']
print(sentiment, score)

Training Data

This model was fine-tuned on a dataset of Indonesian tweets covering various topics such as politics, freedom of speech, Indonesian culture, and more. The dataset includes a balanced distribution of positive, negative, and neutral sentiments.

Performance

The model achieves the following performance on our test set:

Accuracy: 92.3214%
F1 Score: 91.9843%

Limitations

The model's performance may vary on texts from domains significantly different from the training data.
It may not capture very subtle or context-dependent sentiments.
The model's understanding is limited to the Indonesian language and may not perform well on mixed-language texts.

Ethical Considerations

This model is intended for sentiment analysis of public texts. Users should be aware of potential biases in the training data and use the model responsibly, especially in sensitive applications.