Attention-based Sentiment Classifier

This model is an attention-based sentiment classification model that uses a bidirectional GRU with an attention mechanism to classify text sentiment as positive or negative.

Model Description

Developed by: Lantian Wei
Model type: Sentiment Classification
Language(s): English
License: GNU General Public License v3.0
Finetuned from model: Trained from scratch, using pre-trained BERT tokenizer

This sentiment classifier uses a bidirectional GRU architecture with an attention mechanism to focus on the most sentiment-relevant parts of a sentence. The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, a collection of movie reviews with binary sentiment labels.

Model Architecture

Embedding layer (100 dimensions)
Bidirectional GRU (256 hidden dimensions)
Attention mechanism
Fully connected layers
Output: 2 classes (positive/negative)

Intended Uses & Limitations

Intended Uses

Sentiment analysis of short to medium-length English text
Educational purposes to understand attention mechanisms
Research on interpretability in NLP models

Limitations

Only trained on movie reviews, may not generalize to other domains
Limited to English text
Binary classification only (positive/negative)
Not suitable for multi-lingual content
Performance may degrade on texts significantly different from movie reviews

Training Data

The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset, which consists of movie reviews labeled as positive or negative. The dataset is commonly used as a benchmark for sentiment analysis models.

Dataset: SST-2 from the GLUE benchmark
Training examples: 30,000
Validation examples: 500

Training Procedure

Training Hyperparameters

Learning rate: 1e-3
Epochs: 12
Optimizer: Adam
Loss function: Cross Entropy Loss
Embedding dimension: 100
Hidden dimension: 256
Dropout: 0.3

Evaluation Results

Validation accuracy: [Insert your validation accuracy here]
Test accuracy: [Insert your test accuracy here]

Visualization Examples

One of the key features of this model is its interpretability through attention visualization. The model can output attention weights that highlight which parts of the input text it focused on to make its prediction.

Usage Examples

from transformers import AutoTokenizer
from models.huggingface_model import SentimentClassifierForHuggingFace, SentimentClassifierConfig
import torch
import matplotlib.pyplot as plt
import seaborn as sns

# Load the model
config = SentimentClassifierConfig()
model = SentimentClassifierForHuggingFace(config)
model.load_state_dict(torch.load("path_to_weights.pth"))
model.eval()

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Function to make predictions with attention visualization
def predict_with_attention(text):
    # Tokenize
    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
    input_ids = tokens["input_ids"]

    # Get prediction and attention weights
    with torch.no_grad():
        outputs = model(input_ids, return_attention=True, return_dict=True)

    logits = outputs["logits"]
    attention_weights = outputs["attention_weights"]

    # Get prediction and confidence
    probs = torch.nn.functional.softmax(logits, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()
    sentiment = "Positive" if prediction == 1 else "Negative"

    # Visualize attention weights
    tokens_list = [tokenizer.convert_ids_to_tokens(id.item()) for id in input_ids[0]]

    # Plot attention heatmap
    plt.figure(figsize=(10, 2))
    sns.heatmap(
        attention_weights.squeeze(0).cpu().numpy(),
        cmap="YlOrRd",
        annot=True,
        fmt=".2f",
        cbar=False,
        xticklabels=tokens_list,
        yticklabels=["Attention"]
    )
    plt.title(f"Prediction: {sentiment} (Confidence: {confidence:.4f})")
    plt.tight_layout()
    plt.show()

    return {
        "text": text,
        "sentiment": sentiment,
        "confidence": confidence,
        "attention": attention_weights.squeeze(0).cpu().numpy()
    }

# Example usage
result = predict_with_attention("I absolutely loved this movie! The acting was superb.")
print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.4f})")

Citations

@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
  pages={1631--1642},
  year={2013}
}