This is RuBERT model fine-tuned for emotion classification of short Russian texts. The task is a multi-label classification with the following labels:

0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger

Label to Russian label:

no_emotion: нет эмоции
joy: радость
sadness: грусть
surprise: удивление
fear: страх
anger: злость

Usage

from transformers import pipeline
model = pipeline(model="seara/rubert-base-cased-cedr-russian-emotion")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9388909935951233}]

Dataset

This model was trained on CEDR dataset.

An overview of the training data can be found in it's Hugging Face card or in the source article.

Training

Training were done in this project with this parameters:

tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 5

Eval results (on test split)

no_emotion joy sadness surprise fear anger micro avg macro avg weighted avg
precision 0.87 0.84 0.85 0.74 0.7 0.66 0.83 0.78 0.83
recall 0.84 0.86 0.82 0.71 0.74 0.33 0.79 0.72 0.79
f1-score 0.86 0.85 0.84 0.72 0.72 0.44 0.81 0.74 0.8
auc-roc 0.95 0.97 0.96 0.94 0.93 0.86 0.95 0.93 0.95
support 734 353 379 170 141 125 1902 1902 1902
Downloads last month
252
Safetensors
Model size
178M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train seara/rubert-base-cased-russian-emotion-detection-cedr