Sentiment_model / README.md
YamenRM's picture
Update README.md
d64d0eb verified
metadata
language: en
tags:
  - sentiment-analysis
  - text-classification
  - transformers
  - distilbert
datasets:
  - lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
model-index:
  - name: DistilBERT Sentiment Classifier
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        dataset:
          name: IMDB Dataset of 50K Movie Reviews
          type: text
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.93
          - name: F1
            type: f1
            value: 0.93
          - name: Precision
            type: precision
            value: 0.93
          - name: Recall
            type: recall
            value: 0.93
license: apache-2.0
metrics:
  - accuracy
  - precision
  - recall

DistilBERT Sentiment Classifier

Model Details

  • Model Type: Transformer-based classifier (DistilBERT)

  • Base Model: distilbert-base-uncased

  • Language: English

  • Task: Sentiment Analysis (binary classification)

Labels:

0 → Negative

1 → Positive

Framework: Hugging Face Transformers

Intended Uses & Limitations

Intended Use:

Sentiment classification of English reviews, comments, or feedback.

Not Intended Use:

Other languages.

Multi-label sentiment tasks (neutral/mixed).

⚠️ Limitations:

  • May not generalize well outside movie/review-style data.

  • Training data may contain cultural and linguistic bias.

Training Dataset

  • Source: Kaggle Cleaned IMDB Reviews Dataset

  • Size: ~50,000 reviews

  • Classes: positive, negative

  • Converted to integers: positive → 1, negative → 0

Training Procedure

  • Epochs: 3

  • Batch Size: 16

  • Optimizer: AdamW

  • Learning Rate: 5e-5

  • Framework: Hugging Face Trainer API

Evaluation

The model was tested on a held-out validation set of 9,917 reviews.

Class Precision Recall F1-score Support Negative (0) 0.93 0.93 0.93 4,939 Positive (1) 0.93 0.93 0.93 4,978

Overall

  • Accuracy: 93%

  • Macro Avg F1: 0.93

  • Weighted Avg F1: 0.93

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "YamenRM/distilbert-sentiment-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(nlp("I really loved this movie, it was amazing!"))
# [{'label': 'POSITIVE', 'score': 0.98}]