--- language: ["ru"] tags: - russian - classification - sentiment - emotion-classification - multiclass datasets: - cedr widget: - text: "Бесишь меня, падла" - text: "Как здорово, что все мы здесь сегодня собрались" - text: "Как-то стрёмно, давай свалим отсюда?" - text: "Грусть-тоска меня съедает" - text: "Данный фрагмент текста не содержит абсолютно никаких эмоций" - text: "Нифига себе, неужели так тоже бывает!" --- This is the [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for classification of emotions in Russian sentences. The task is multilabel classification, because one sentence can contain multiple emotions. The model on the [CEDR dataset](https://huggingface.co/datasets/cedr) described in the paper ["Data-Driven Model for Emotion Detection in Russian Texts"](https://doi.org/10.1016/j.procs.2021.06.075) by Sboev et al. The model has been trained with Adam optimizer for 40 epochs with learning rate `1e-5` and batch size 64 [in this notebook](https://colab.research.google.com/drive/1AFW70EJaBn7KZKRClDIdDUpbD46cEsat?usp=sharing). The quality of the predicted probabilities on the test dataset is the following: | label | no emotion | joy |sadness |surprise| fear |anger | mean | mean (emotions) | |----------|------------|--------|--------|--------|--------|--------| --------| ----------------| | AUC | 0.9286 | 0.9512 | 0.9564 | 0.8908 | 0.8955 | 0.7511 | 0.8956 | 0.8890 | | F1 micro | 0.8624 | 0.9389 | 0.9362 | 0.9469 | 0.9575 | 0.9261 | 0.9280 | 0.9411 | | F1 macro | 0.8562 | 0.8962 | 0.9017 | 0.8366 | 0.8359 | 0.6820 | 0.8348 | 0.8305 |