README.md · j-hartmann/emotion-english-distilroberta-base at 312b82913ad5aeeac712e4ecf55062659d65e26d

metadata

language: en
tags:
  - distilroberta
  - sentiment
  - emotion
  - twitter
  - reddit
widget:
  - text: Oh wow. I didn't know that.
  - text: This movie always makes me cry..
  - text: Oh Happy Day

Description ℹ

With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:

anger 🤬
disgust 🤢
fear 😨
joy 😀
neutral 😐
sadness 😭
surprise 😲

The model is a fine-tuned checkpoint of DistilRoBERTa-base.

Application 🚀

a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:

b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:

Contact 💻

Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback.

Thanks to Samuel Domdey and chrsiebert for their support in making this model available.

Appendix 📚

Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets.

Name	anger	disgust	fear	joy	neutral	sadness	surprise
Crowdflower (2016)	Yes	-	-	Yes	Yes	Yes	Yes
Emotion Dataset, Elvis et al. (2018)	Yes	-	Yes	Yes	-	Yes	Yes
GoEmotions, Demszky et al. (2020)	Yes	Yes	Yes	Yes	Yes	Yes	Yes
ISEAR, Vikash (2018)	Yes	Yes	Yes	Yes	-	Yes	-
MELD, Poria et al. (2019)	Yes	Yes	Yes	Yes	Yes	Yes	Yes
SemEval-2018, EI-reg (Mohammad et al. 2018)	Yes	-	Yes	Yes	-	Yes	-

The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.

The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). The evaluation accuracy on a holdout test set is 66% (and significantly above the random-chance baseline of 1/7 = 14%).