|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- agentlans/twitter-sentiment-meta-analysis |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/deberta-v3-xsmall |
|
pipeline_tag: text-classification |
|
--- |
|
# DeBERTa-v3 Twitter Sentiment Models |
|
|
|
This page contains one of two DeBERTa-v3 models (xsmall and base) fine-tuned for Twitter sentiment regression. |
|
|
|
## Model Details |
|
|
|
- **Model Architecture**: DeBERTa-v3 |
|
- **Variants**: |
|
- xsmall |
|
- base |
|
- **Task**: Sentiment regression |
|
- **Language**: English |
|
- **License**: Apache 2.0 |
|
|
|
## Intended Use |
|
|
|
These models are designed for fine-grained sentiment analysis of English tweets. They output a **continuous sentiment score** rather than discrete categories. |
|
- negative score means negative sentiment |
|
- zero score means neutral sentiment |
|
- positive score means positive sentiment |
|
- the absolute value of the score represents how strong that sentiment is |
|
|
|
## Training Data |
|
|
|
The models were fine-tuned on a dataset of English tweets collected between September 2009 and January 2010. The sentiment scores were derived from a meta-analysis of 10 different sentiment classifiers using principal component analysis. Find the dataset at [agentlans/twitter-sentiment-meta-analysis](https://huggingface.co/datasets/agentlans/twitter-sentiment-meta-analysis). |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
model_name="agentlans/deberta-v3-xsmall-tweet-sentiment" |
|
|
|
# Put model on GPU or else CPU |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = model.to(device) |
|
|
|
def sentiment(text): |
|
"""Processes the text using the model and returns its logits. |
|
In this case, it's interpreted as the sentiment score for that text.""" |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device) |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits.squeeze().cpu() |
|
return logits.tolist() |
|
|
|
# Example usage |
|
text = [x.strip() for x in """ |
|
I absolutely despise this product and regret ever purchasing it. |
|
The service at that restaurant was terrible and ruined our entire evening. |
|
I'm feeling a bit under the weather today, but it's not too bad. |
|
The weather is quite average today, neither good nor bad. |
|
The movie was okay, I didn't love it but I didn't hate it either. |
|
I'm looking forward to the weekend, it should be nice to relax. |
|
This new coffee shop has a really pleasant atmosphere and friendly staff. |
|
I'm thrilled with my new job and the opportunities it presents! |
|
The concert last night was absolutely incredible, easily the best I've ever seen. |
|
I'm overjoyed and grateful for all the love and support from my friends and family. |
|
""".strip().split("\n")] |
|
|
|
for x, s in zip(text, sentiment(text)): |
|
print(f"Text: {x}\nSentiment: {round(s, 2)}\n") |
|
``` |
|
|
|
Output: |
|
```text |
|
Text: I absolutely despise this product and regret ever purchasing it. |
|
Sentiment: -2.28 |
|
|
|
Text: The service at that restaurant was terrible and ruined our entire evening. |
|
Sentiment: -2.38 |
|
|
|
Text: I'm feeling a bit under the weather today, but it's not too bad. |
|
Sentiment: 0.25 |
|
|
|
Text: The weather is quite average today, neither good nor bad. |
|
Sentiment: -0.14 |
|
|
|
Text: The movie was okay, I didn't love it but I didn't hate it either. |
|
Sentiment: 0.06 |
|
|
|
Text: I'm looking forward to the weekend, it should be nice to relax. |
|
Sentiment: 2.06 |
|
|
|
Text: This new coffee shop has a really pleasant atmosphere and friendly staff. |
|
Sentiment: 2.48 |
|
|
|
Text: I'm thrilled with my new job and the opportunities it presents! |
|
Sentiment: 2.66 |
|
|
|
Text: The concert last night was absolutely incredible, easily the best I've ever seen. |
|
Sentiment: 2.68 |
|
|
|
Text: I'm overjoyed and grateful for all the love and support from my friends and family. |
|
Sentiment: 2.65 |
|
``` |
|
|
|
## Performance |
|
|
|
Evaluation set RMSE: |
|
- xsmall: 0.2560 |
|
- base: 0.1938 |
|
|
|
## Limitations |
|
|
|
- English language only |
|
- Trained specifically on tweets, may or may not generalize well to other text types |
|
- Lack of broader context beyond individual tweets |
|
- May struggle with detecting sarcasm or nuanced sentiment |
|
|
|
## Ethical Considerations |
|
|
|
- Potential biases in the training data related to the time period and Twitter user demographics |
|
- Risk of misuse for large-scale sentiment monitoring without consent |