Attention-based Sentiment Classifier
This repository contains an attention-based sentiment classification model that demonstrates how attention mechanisms can enhance interpretability in NLP tasks.
Model Overview
This model uses a bidirectional GRU with an attention mechanism to classify text sentiment (positive/negative). The attention mechanism allows the model to focus on the most relevant parts of the input text, providing insight into which words influence the classification the most.
Key Features
- Bidirectional GRU architecture
- Additive attention mechanism for interpretability
- Binary sentiment classification (positive/negative)
- Visualization tools for attention weights
Quick Start
from transformers import pipeline
import matplotlib.pyplot as plt
import seaborn as sns
# Load model directly from Hugging Face
classifier = pipeline(
"text-classification",
model="ericwei/attention-sentiment-classifier"
)
# Standard prediction
result = classifier("I absolutely loved this movie! The acting was superb.")
print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}")
# For attention visualization, use the model directly
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("ericwei/attention-sentiment-classifier")
model = AutoModel.from_pretrained("weicwei/attention-sentiment-classifier")
text = "I absolutely loved this movie! The acting was superb."
inputs = tokenizer(text, return_tensors="pt")
# Get prediction with attention weights
model.eval()
with torch.no_grad():
outputs = model(inputs["input_ids"], return_attention=True, return_dict=True)
# Get prediction results
logits = outputs["logits"]
attention_weights = outputs["attention_weights"]
# Visualize attention
tokens = [tokenizer.convert_ids_to_tokens(id.item()) for id in inputs["input_ids"][0]]
plt.figure(figsize=(10, 2))
sns.heatmap(
attention_weights.squeeze(0).cpu().numpy().reshape(1, -1),
cmap="YlOrRd",
annot=True,
fmt=".2f",
cbar=False,
xticklabels=tokens,
yticklabels=["Attention"]
)
plt.xticks(rotation=45, ha="right", rotation_mode="anchor")
plt.title("Attention Weights Visualization")
plt.tight_layout()
plt.show()
Demo App
This model includes a Streamlit demo app that can be launched directly on Hugging Face Spaces.
Model Architecture
The model consists of:
- Embedding Layer: Converts token IDs to dense vectors
- Bidirectional GRU: Processes the text in both directions
- Attention Mechanism: Focuses on the most relevant parts of the text
- Classifier Head: Makes the final sentiment prediction
Training
The model was trained on the SST-2 (Stanford Sentiment Treebank) dataset using the following hyperparameters:
- Learning rate: 1e-3
- Epochs: 12
- Optimizer: Adam
- Loss function: Cross Entropy Loss
- Embedding dimension: 100
- Hidden dimension: 256
Limitations
- Only trained on movie reviews, may not generalize to other domains
- Limited to English text
- Binary classification only (positive/negative)
- Not suitable for multi-lingual content
- Performance may degrade on texts significantly different from movie reviews
Citation
If you use this model, please cite:
@misc{attention-sentiment-classifier,
author = {Lantian Wei},
title = {Attention-based Sentiment Classifier},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/your-username/attention-sentiment-classifier}}
}
License
This model is licensed under the GNU General Public License v3.0.
- Downloads last month
- 2