seara/rubert-base-cased-cedr-russian-emotion

This is RuBERT model fine-tuned for emotion classification of short Russian texts. The task is a multi-label classification with the following labels:

0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger

Label to Russian label:

no_emotion: нет эмоции
joy: радость
sadness: грусть
surprise: удивление
fear: страх
anger: злость

Usage

from transformers import pipeline
model = pipeline(model="seara/rubert-base-cased-cedr-russian-emotion")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9388909935951233}]

Dataset

This model was trained on CEDR dataset.

An overview of the training data can be found in it's Hugging Face card or in the source article.

Training

Training were done in this project with this parameters:

tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 5

Eval results (on test split)

	no_emotion	joy	sadness	surprise	fear	anger	micro avg	macro avg	weighted avg
precision	0.87	0.84	0.85	0.74	0.7	0.66	0.83	0.78	0.83
recall	0.84	0.86	0.82	0.71	0.74	0.33	0.79	0.72	0.79
f1-score	0.86	0.85	0.84	0.72	0.72	0.44	0.81	0.74	0.8
auc-roc	0.95	0.97	0.96	0.94	0.93	0.86	0.95	0.93	0.95
support	734	353	379	170	141	125	1902	1902	1902

seara
/

rubert-base-cased-cedr-russian-emotion

Usage

Dataset

Training

Eval results (on test split)

Dataset used to train seara/rubert-base-cased-cedr-russian-emotion