|
--- |
|
language: it |
|
license: mit |
|
tags: |
|
- sentiment |
|
- Italian |
|
--- |
|
|
|
# FEEL-IT: Emotion and Sentiment Classification for the Italian Language |
|
## Abstract |
|
|
|
Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad? |
|
An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce *FEEL-IT*, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: **anger, fear, joy, sadness**. By collapsing them, we can also do **sentiment analysis**. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results. |
|
We release an [open-source Python library](https://github.com/MilaNLProc/feel-it), so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text. |
|
|
|
| Model | Download | |
|
| ------ | -------------------------| |
|
| `feel-it-italian-sentiment` | [Link](https://huggingface.co/MilaNLProc/feel-it-italian-sentiment) | |
|
| `feel-it-italian-emotion` | [Link](https://huggingface.co/MilaNLProc/feel-it-italian-emotion) | |
|
|
|
|
|
## Model |
|
|
|
The *feel-it-italian-sentiment* model performs **sentiment analysis** on Italian. We fine-tuned the [UmBERTo model](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1) on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different benchmark corpus. |
|
|
|
## Data |
|
|
|
Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper (preprint available soon). |
|
|
|
## Performance |
|
|
|
We evaluate our performance using [SENTIPOLC16 Evalita](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/). We collapsed the FEEL-IT classes into 2 by mapping joy to the *positive* class and anger, fear and sadness into the *negative* class. We compare three different training dataset combinations to understand whether it is better to train on FEEL-IT, SP16, or both by testing on the SP16 test set. |
|
|
|
|
|
This dataset comes with a training set and a testing set and thus we can compare the performance of different training datasets on the SENTIPOLC test set. |
|
|
|
We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide better results on the SENTIPOLC test set than those that can be obtained with the SENTIPOLC training set. |
|
|
|
| Training Dataset | Macro-F1 | Accuracy |
|
| ------ | ------ |------ | |
|
| SENTIPOLC16 | 0.80 | 0.81 | |
|
| FEEL-IT | **0.81** | **0.84** | |
|
| FEEL-IT+SentiPolc | 0.81 | 0.82 |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/feel-it-italian-sentiment") |
|
model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/feel-it-italian-sentiment") |
|
|
|
sentence = 'Oggi sono proprio contento!' |
|
inputs = tokenizer(sentence, return_tensors="pt") |
|
|
|
# Call the model and get the logits |
|
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 |
|
outputs = model(**inputs, labels=labels) |
|
loss, logits = outputs[:2] |
|
logits = logits.squeeze(0) |
|
|
|
# Extract probabilities |
|
proba = torch.nn.functional.softmax(logits, dim=0) |
|
|
|
# Unpack the tensor to obtain negative and positive probabilities |
|
negative, positive = proba |
|
print(f"Probabilities: Negative {np.round(negative.item(),4)} - Positive {np.round(positive.item(),4)}") |
|
``` |
|
|
|
## Citation |
|
Please use the following bibtex entry if you use this model in your project: |
|
``` |
|
@inproceedings{bianchi2021feel, |
|
title = {{"FEEL-IT: Emotion and Sentiment Classification for the Italian Language"}}, |
|
author = "Bianchi, Federico and Nozza, Debora and Hovy, Dirk", |
|
booktitle = "Proceedings of the 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis", |
|
year = "2021", |
|
publisher = "Association for Computational Linguistics", |
|
} |
|
``` |