File size: 2,313 Bytes

092a428
7a76c4f
 
0e045a8
7a76c4f
 
 
079ee57
092a428
 
ada1ec8
092a428
e06f86e
092a428
 
 
75b9a09
 
d2d55ce
75b9a09
 
 
 
 
 
 
 
 
d2d55ce
04392f3
75b9a09
 
f7dc7cd
c1f87b8
 
 
f7dc7cd
c1f87b8
80a6c9d
 
 
 
93618c7
 
5a9e9a6
 
 
 
040cd5d
5a9e9a6
155a92b
 
4f2ad24
ab82646
67e2b84
4f2ad24
bcb853b
7f8f33e
 
e70dbbf

---
language: "en"
tags:
- distilroberta
- sentiment
- emotion
- twitter
- reddit

widget:
- text: "Oh wow. I didn't know that."
- text: "This movie always makes me cry.."
- text: "Oh Happy Day"

---

## Description

With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix) and predicts Ekman's 6 basic emotions, plus a neutral class:

1) anger
2) disgust
3) fear
4) joy
5) neutral
6) sadness
7) surprise

The model is a fine-tuned checkpoint of DistilRoBERTa-base.

## Application

a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)

b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)

## Contact

Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback.

Thanks to Samuel Domdey and chrsiebert for their support in making this model available.

## Appendix

Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets.

|Name|anger|disgust|fear|joy|neutral|sadness|surprise|
|---|---|---|---|---|---|---|---|
|Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes|
|Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes|
|GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-|
|MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|SemEval-2018, EI-reg (Mohammad et al. 2018) |Yes|-|Yes|Yes|-|Yes|-|

The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.