File size: 2,290 Bytes
f789c26
 
 
 
 
 
 
 
 
 
 
e2fd638
f789c26
e2fd638
f789c26
 
 
e2fd638
 
 
 
 
 
 
f789c26
 
 
 
9026935
f789c26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9026935
f789c26
9026935
f789c26
 
 
 
4886977
f789c26
4886977
 
f789c26
 
 
4886977
f789c26
 
e2fd638
f789c26
 
 
 
9026935
f789c26
 
 
 
 
 
9026935
 
 
 
f789c26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
language:
- ru
metrics:
- f1
- roc_auc
- precision
- recall
pipeline_tag: text-classification
tags:
- sentiment-analysis
- multi-label-classification
- sentiment analysis
- rubert
- sentiment
- bert
- tiny
- russian
- multilabel
- classification
- emotion-classification
- emotion-recognition
- emotion
datasets:
- cedr
---

This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __emotion classification__ of short __Russian__ texts.
The task is a __multi-label classification__ with the following labels:

```yaml
0: no_emotion
1: joy
2: sadness
3: surprise
4: fear
5: anger
```

Label to Russian label:

```yaml
no_emotion: нет эмоции
joy: радость
sadness: грусть
surprise: удивление
fear: страх
anger: злость
```

## Usage

```python
from transformers import pipeline
model = pipeline(model="seara/rubert-base-cased-cedr-russian-emotion")
model("Привет, ты мне нравишься!")
# [{'label': 'joy', 'score': 0.9388909935951233}]
```

## Dataset

This model was trained on [CEDR dataset](https://huggingface.co/datasets/cedr).

An overview of the training data can be found in it's [Hugging Face card](https://huggingface.co/datasets/cedr) 
or in the source [article](https://www.sciencedirect.com/science/article/pii/S1877050921013247).

## Training

Training were done in this [project](https://github.com/searayeah/bert-russian-sentiment-emotion) with this parameters:

```yaml
tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 5
```

## Eval results (on test split)

|         |no_emotion|joy   |sadness|surprise|fear   |anger|micro avg|macro avg|weighted avg|
|---------|----------|------|-------|--------|-------|-----|---------|---------|------------|
|precision|0.87      |0.84  |0.85   |0.74    |0.7    |0.66 |0.83     |0.78     |0.83        |
|recall   |0.84      |0.86  |0.82   |0.71    |0.74   |0.33 |0.79     |0.72     |0.79        |
|f1-score |0.86      |0.85  |0.84   |0.72    |0.72   |0.44 |0.81     |0.74     |0.8         |
|auc-roc  |0.95      |0.97  |0.96   |0.94    |0.93   |0.86 |0.95     |0.93     |0.95        |
|support  |734       |353   |379    |170     |141    |125  |1902     |1902     |1902        |