BERT-Based Sentiment Analysis Models

Model Description

This repository contains two versions of BERT-based models fine-tuned for sentiment analysis tasks:

BERT-1: Fine-tuned on the IMDB movie reviews dataset.
BERT-2: Fine-tuned on a combined dataset of IMDB movie reviews dataset and Twitter comments.

Both models are based on the bert-base-uncased pre-trained model from Hugging Face's Transformers library.

Intended Use

These models are intended for binary sentiment analysis of English text data. They can be used to classify text into positive or negative sentiment categories.

Loading the Models

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load BERT-1
tokenizer_bert1 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")
model_bert1 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")

# Load BERT-2
tokenizer_bert2 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")
model_bert2 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")

Performing Sentiment Analysis

from transformers import pipeline

# Initialize pipelines
sentiment_pipeline_bert1 = pipeline("sentiment-analysis", model=model_bert1, tokenizer=tokenizer_bert1)
sentiment_pipeline_bert2 = pipeline("sentiment-analysis", model=model_bert2, tokenizer=tokenizer_bert2)

# Sample text
text = "I absolutely loved this product! It exceeded my expectations."

# Get predictions
result_bert1 = sentiment_pipeline_bert1(text)
result_bert2 = sentiment_pipeline_bert2(text)

print("BERT-1 Prediction:", result_bert1)
print("BERT-2 Prediction:", result_bert2)

Training Details

BERT-1

Dataset: IMDB Movie Reviews Dataset
Objective: Binary sentiment classification (positive/negative)
Optimizer: AdamW with a learning rate lr (value unspecified)
Scheduler: Linear scheduler with warmup (get_linear_schedule_with_warmup)
Epochs: num_epochs = 3
Device: Trained on GPU if available
Metrics Monitored: Training loss, training accuracy, testing accuracy per epoch

BERT-2

Dataset:
- IMDB Movie Reviews Dataset
- Twitter Comment - Sentiment Analysis Dataset
Objective: Binary sentiment classification (positive/negative)
Optimizer: AdamW with weight decay (0.01) and parameters requiring gradients
Scheduler: Linear scheduler with warmup (10% of total steps)
Gradient Clipping: Applied with max_norm=1.0
Early Stopping: Implemented with a patience of 2 epochs without improvement in validation loss
Epochs: num_epochs = 3, training may stop early due to early stopping
Device: Trained on GPU if available
Metrics Monitored: Training loss, training accuracy, validation loss, validation accuracy per epoch

Limitations and Biases

Data Bias: The models are trained on specific datasets, which may contain inherent biases such as demographic or cultural biases.
Language Support: Only supports English language text.
Generalization: Performance may degrade on text significantly different from the training data (e.g., slang, jargon).
Ethical Considerations: Users should be cautious of potential biases in predictions and should not use the model for critical decisions without human oversight.

License

The models are distributed under the same license as the original bert-base-uncased model (Apache License 2.0).

Acknowledgements

Thanks to the Hugging Face team for providing the Transformers library and model hosting.
The IMDB dataset is made available by Maas et al. under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

Disclaimer: The models are provided "as is" without warranty of any kind. The author is not responsible for any outcomes resulting from the use of these models.

verneylmavt
/

bert-base-uncased_sentiment-analysis