cointegrated's picture
Update README.md
cd3543a
metadata
language:
  - ru
tags:
  - russian
  - classification
  - sentiment
  - emotion-classification
  - multiclass
datasets:
  - cedr
widget:
  - text: Бесишь меня, падла
  - text: Как здорово, что все мы здесь сегодня собрались
  - text: Как-то стрёмно, давай свалим отсюда?
  - text: Грусть-тоска меня съедает
  - text: Данный фрагмент текста не содержит абсолютно никаких эмоций
  - text: Нифига себе, неужели так тоже бывает!

This is the cointegrated/rubert-tiny2 model fine-tuned for classification of emotions in Russian sentences. The task is multilabel classification, because one sentence can contain multiple emotions.

The model on the CEDR dataset described in the paper "Data-Driven Model for Emotion Detection in Russian Texts" by Sboev et al.

The model has been trained with Adam optimizer for 40 epochs with learning rate 1e-5 and batch size 64 in this notebook.

The quality of the predicted probabilities on the test dataset is the following:

label no emotion joy sadness surprise fear anger mean mean (emotions)
AUC 0.9286 0.9512 0.9564 0.8908 0.8955 0.7511 0.8956 0.8890
F1 micro 0.8624 0.9389 0.9362 0.9469 0.9575 0.9261 0.9280 0.9411
F1 macro 0.8562 0.8962 0.9017 0.8366 0.8359 0.6820 0.8348 0.8305