Edit model card

Indonesia DistilledBERT Sentiment Classification

Overview

This model is a fine-tuned version of DistilledBERT for sentiment classification in Indonesian language. It's designed to analyze and categorize the sentiment of Indonesian text into positive, negative, or neutral categories.

Model Details

  • Model Type: DistilledBERT
  • Language: Indonesian
  • Task: Sentiment Classification
  • Base Model: distilbert-base-uncased

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

pretrained= "fathurfrs/indonesia-distilledbert-sentiment-classification"

model = AutoModelForSequenceClassification.from_pretrained(pretrained)
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

sentiment_analysis = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)


label2id = {"LABEL_0": "negative", "LABEL_1": "neutral", "LABEL_2": "positive"}
text = "Saya sangat senang dengan pelayanan di restoran ini."
sentiment = label2id[sentiment_analysis(text)[0]['label']]
score = sentiment_analysis(text)[0]['score']
print(sentiment, score)

Training Data

This model was fine-tuned on a dataset of Indonesian tweets covering various topics such as politics, freedom of speech, Indonesian culture, and more. The dataset includes a balanced distribution of positive, negative, and neutral sentiments.

Performance

The model achieves the following performance on our test set:

  • Accuracy: 92.3214%
  • F1 Score: 91.9843%

Limitations

  • The model's performance may vary on texts from domains significantly different from the training data.
  • It may not capture very subtle or context-dependent sentiments.
  • The model's understanding is limited to the Indonesian language and may not perform well on mixed-language texts.

Ethical Considerations

This model is intended for sentiment analysis of public texts. Users should be aware of potential biases in the training data and use the model responsibly, especially in sensitive applications.

Downloads last month
55
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.