README.md · seara/rubert-base-cased-russian-emotion-detection-ru-go-emotions at main

metadata

license: mit
language:
  - ru
metrics:
  - f1
  - roc_auc
  - precision
  - recall
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - multi-label-classification
  - sentiment analysis
  - rubert
  - sentiment
  - bert
  - russian
  - multilabel
  - classification
  - emotion-classification
  - emotion-recognition
  - emotion
  - emotion-detection
datasets:
  - seara/ru_go_emotions

This is RuBERT model fine-tuned for emotion classification of short Russian texts. The task is a multi-label classification with the following labels:

0: admiration
1: amusement
2: anger
3: annoyance
4: approval
5: caring
6: confusion
7: curiosity
8: desire
9: disappointment
10: disapproval
11: disgust
12: embarrassment
13: excitement
14: fear
15: gratitude
16: grief
17: joy
18: love
19: nervousness
20: optimism
21: pride
22: realization
23: relief
24: remorse
25: sadness
26: surprise
27: neutral

Label to Russian label:

admiration: восхищение
amusement: веселье
anger: злость
annoyance: раздражение
approval: одобрение
caring: забота
confusion: непонимание
curiosity: любопытство
desire: желание
disappointment: разочарование
disapproval: неодобрение
disgust: отвращение
embarrassment: смущение
excitement: возбуждение
fear: страх
gratitude: признательность
grief: горе
joy: радость
love: любовь
nervousness: нервозность
optimism: оптимизм
pride: гордость
realization: осознание
relief: облегчение
remorse: раскаяние
sadness: грусть
surprise: удивление
neutral: нейтральность

Usage

from transformers import pipeline
model = pipeline(model="seara/rubert-base-cased-ru-go-emotions")
model("Привет, ты мне нравишься!")
# [{'label': 'love', 'score': 0.5456761717796326}]

Dataset

This model was trained on translated GoEmotions dataset called ru_go_emotions.

An overview of the training data can be found on Hugging Face card and on Github repository.

Training

Training were done in this project with this parameters:

tokenizer.max_length: null
batch_size: 32
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 5

Eval results (on test split)

	precision	recall	f1-score	auc-roc	support
admiration	0.66	0.66	0.66	0.93	504
amusement	0.79	0.81	0.8	0.97	264
anger	0.53	0.3	0.39	0.91	198
annoyance	0.0	0.0	0.0	0.82	320
approval	0.62	0.25	0.36	0.82	351
caring	0.69	0.13	0.22	0.86	135
confusion	0.56	0.18	0.28	0.92	153
curiosity	0.52	0.4	0.45	0.95	284
desire	0.67	0.24	0.35	0.89	83
disappointment	0.88	0.05	0.09	0.82	151
disapproval	0.56	0.17	0.26	0.88	267
disgust	0.83	0.2	0.33	0.92	123
embarrassment	0.0	0.0	0.0	0.88	37
excitement	0.78	0.14	0.23	0.9	103
fear	0.83	0.37	0.51	0.92	78
gratitude	0.94	0.9	0.92	0.99	352
grief	0.0	0.0	0.0	0.72	6
joy	0.7	0.4	0.51	0.94	161
love	0.77	0.81	0.79	0.97	238
nervousness	0.0	0.0	0.0	0.85	23
optimism	0.66	0.52	0.58	0.92	186
pride	0.0	0.0	0.0	0.76	16
realization	0.0	0.0	0.0	0.74	145
relief	0.0	0.0	0.0	0.72	11
remorse	0.58	0.68	0.63	0.99	56
sadness	0.58	0.44	0.5	0.92	156
surprise	0.62	0.45	0.52	0.91	141
neutral	0.72	0.47	0.57	0.84	1787
micro avg	0.7	0.42	0.53	0.94	6329
macro avg	0.52	0.31	0.36	0.88	6329
weighted avg	0.63	0.42	0.49	0.88	6329