BERT-Based Sentiment Analysis Models
Model Description
This repository contains two versions of BERT-based models fine-tuned for sentiment analysis tasks:
- BERT-1: Fine-tuned on the IMDB movie reviews dataset.
- BERT-2: Fine-tuned on a combined dataset of IMDB movie reviews dataset and Twitter comments.
Both models are based on the bert-base-uncased
pre-trained model from Hugging Face's Transformers library.
Intended Use
These models are intended for binary sentiment analysis of English text data. They can be used to classify text into positive or negative sentiment categories.
Loading the Models
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load BERT-1
tokenizer_bert1 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")
model_bert1 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-1")
# Load BERT-2
tokenizer_bert2 = AutoTokenizer.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")
model_bert2 = AutoModelForSequenceClassification.from_pretrained("verneylmavt/bert-base-uncased_sentiment-analysis/bert-2")
Performing Sentiment Analysis
from transformers import pipeline
# Initialize pipelines
sentiment_pipeline_bert1 = pipeline("sentiment-analysis", model=model_bert1, tokenizer=tokenizer_bert1)
sentiment_pipeline_bert2 = pipeline("sentiment-analysis", model=model_bert2, tokenizer=tokenizer_bert2)
# Sample text
text = "I absolutely loved this product! It exceeded my expectations."
# Get predictions
result_bert1 = sentiment_pipeline_bert1(text)
result_bert2 = sentiment_pipeline_bert2(text)
print("BERT-1 Prediction:", result_bert1)
print("BERT-2 Prediction:", result_bert2)
Training Details
BERT-1
- Dataset: IMDB Movie Reviews Dataset
- Objective: Binary sentiment classification (positive/negative)
- Optimizer: AdamW with a learning rate
lr
(value unspecified) - Scheduler: Linear scheduler with warmup (
get_linear_schedule_with_warmup
) - Epochs:
num_epochs = 3
- Device: Trained on GPU if available
- Metrics Monitored: Training loss, training accuracy, testing accuracy per epoch
BERT-2
- Dataset:
- Objective: Binary sentiment classification (positive/negative)
- Optimizer: AdamW with weight decay (
0.01
) and parameters requiring gradients - Scheduler: Linear scheduler with warmup (
10%
of total steps) - Gradient Clipping: Applied with
max_norm=1.0
- Early Stopping: Implemented with a patience of
2
epochs without improvement in validation loss - Epochs:
num_epochs = 3
, training may stop early due to early stopping - Device: Trained on GPU if available
- Metrics Monitored: Training loss, training accuracy, validation loss, validation accuracy per epoch
Limitations and Biases
- Data Bias: The models are trained on specific datasets, which may contain inherent biases such as demographic or cultural biases.
- Language Support: Only supports English language text.
- Generalization: Performance may degrade on text significantly different from the training data (e.g., slang, jargon).
- Ethical Considerations: Users should be cautious of potential biases in predictions and should not use the model for critical decisions without human oversight.
License
The models are distributed under the same license as the original bert-base-uncased
model (Apache License 2.0).
Acknowledgements
- Thanks to the Hugging Face team for providing the Transformers library and model hosting.
- The IMDB dataset is made available by Maas et al. under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Disclaimer: The models are provided "as is" without warranty of any kind. The author is not responsible for any outcomes resulting from the use of these models.
Model tree for verneylmavt/bert-base-uncased_sentiment-analysis
Base model
google-bert/bert-base-uncased