File size: 4,060 Bytes

092a428
7a76c4f
 
0e045a8
7a76c4f
 
 
079ee57
092a428
 
ada1ec8
092a428
e06f86e
092a428
 
 
46f4d34
 
87ce035
75b9a09
a51ab62
75b9a09
0a8baea
 
 
 
 
 
 
75b9a09
6e991b4
04392f3
87ce035
75b9a09
f7dc7cd
c1f87b8
 
 
5e453b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7dc7cd
c1f87b8
80a6c9d
 
87ce035
80a6c9d
85573e7
93618c7
5a9e9a6
 
b0e7e96
87ce035
71b3fd2
87ce035
 
71b3fd2
 
 
 
 
 
 
87ce035
71b3fd2
87ce035
71b3fd2
87ce035
 
 
 
5a9e9a6
56a10c2
5a9e9a6
155a92b
 
4f2ad24
ab82646
67e2b84
4f2ad24
bcb853b
7f8f33e
 
88432a0

---
language: "en"
tags:
- distilroberta
- sentiment
- emotion
- twitter
- reddit

widget:
- text: "Oh wow. I didn't know that."
- text: "This movie always makes me cry.."
- text: "Oh Happy Day"

---

# Emotion English DistilRoBERTa-base

# Description ℹ

With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:

1) anger 🤬
2) disgust 🤢
3) fear 😨
4) joy 😀
5) neutral 😐
6) sadness 😭
7) surprise 😲

The model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base). For a 'non-distilled' emotion model, please refer to the model card of the [RoBERTa-large](https://huggingface.co/j-hartmann/emotion-english-roberta-large) version.

# Application 🚀

a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)

```python
from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)
classifier("I love this!")
```

```python
Output:
[[{'label': 'anger', 'score': 0.004419783595949411},
  {'label': 'disgust', 'score': 0.0016119900392368436},
  {'label': 'fear', 'score': 0.0004138521908316761},
  {'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'neutral', 'score': 0.005764586851000786},
  {'label': 'sadness', 'score': 0.002092392183840275},
  {'label': 'surprise', 'score': 0.008528684265911579}]]
```

b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)

# Contact 💻

Please reach out to [j.p.hartmann@rug.nl](mailto:j.p.hartmann@rug.nl) if you have any questions or feedback.

Thanks to Samuel Domdey and chrsiebert for their support in making this model available.

# Reference ✅

For attribution, please cite the following reference if you use this model. A working paper will be available soon.

```
Jochen Hartmann, "Emotion English DistilRoBERTa-base". https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/, 2022.
```

BibTex citation:

```
@misc{hartmann2022emotionenglish,
  author={Hartmann, Jochen},
  title={Emotion English DistilRoBERTa-base},
  year={2022},
  howpublished = {\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},
}
```

# Appendix 📚

Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets. The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here. 

|Name|anger|disgust|fear|joy|neutral|sadness|surprise|
|---|---|---|---|---|---|---|---|
|Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes|
|Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes|
|GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-|
|MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|SemEval-2018, EI-reg (Mohammad et al. 2018) |Yes|-|Yes|Yes|-|Yes|-|

The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). 80% of this balanced subset is used for training and 20% for evaluation. The evaluation accuracy is 66% (vs. the random-chance baseline of 1/7 = 14%).