seara's picture
Update README.md
fb1cd19
|
raw
history blame
No virus
4.59 kB
---
license: mit
language:
- ru
metrics:
- f1
- roc_auc
- precision
- recall
pipeline_tag: text-classification
tags:
- sentiment-analysis
- multi-label-classification
- sentiment analysis
- rubert
- sentiment
- bert
- tiny
- russian
- multilabel
- classification
- emotion-classification
- emotion-recognition
- emotion
datasets:
- seara/ru_go_emotions
---
This is [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for __emotion classification__ of short __Russian__ texts.
The task is a __multi-label classification__ with the following labels:
```yaml
0: admiration
1: amusement
2: anger
3: annoyance
4: approval
5: caring
6: confusion
7: curiosity
8: desire
9: disappointment
10: disapproval
11: disgust
12: embarrassment
13: excitement
14: fear
15: gratitude
16: grief
17: joy
18: love
19: nervousness
20: optimism
21: pride
22: realization
23: relief
24: remorse
25: sadness
26: surprise
27: neutral
```
Label to Russian label:
```yaml
admiration: восхищение
amusement: веселье
anger: злость
annoyance: раздражение
approval: одобрение
caring: забота
confusion: непонимание
curiosity: любопытство
desire: желание
disappointment: разочарование
disapproval: неодобрение
disgust: отвращение
embarrassment: смущение
excitement: возбуждение
fear: страх
gratitude: признательность
grief: горе
joy: радость
love: любовь
nervousness: нервозность
optimism: оптимизм
pride: гордость
realization: осознание
relief: облегчение
remorse: раскаяние
sadness: грусть
surprise: удивление
neutral: нейтральность
```
## Usage
```python
from transformers import pipeline
model = pipeline(model="seara/rubert-tiny2-ru-go-emotions")
model("Привет, ты мне нравишься!")
# [{'label': 'love', 'score': 0.5955629944801331}]
```
## Dataset
This model was trained on translated GoEmotions dataset called [ru_go_emotions](https://huggingface.co/datasets/seara/ru_go_emotions).
An overview of the training data can be found on [Hugging Face card](https://huggingface.co/datasets/seara/ru_go_emotions) and on
[Github repository](https://github.com/searayeah/ru-goemotions).
## Training
Training were done in this [project](https://github.com/searayeah/bert-russian-sentiment-emotion) with this parameters:
```yaml
tokenizer.max_length: null
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
num_epochs: 31
```
## Eval results (on test split)
| |precision|recall|f1-score|auc-roc|support|
|--------------|---------|------|--------|-------|-------|
|admiration |0.68 |0.61 |0.64 |0.92 |504 |
|amusement |0.8 |0.84 |0.82 |0.96 |264 |
|anger |0.55 |0.33 |0.42 |0.9 |198 |
|annoyance |0.56 |0.03 |0.06 |0.81 |320 |
|approval |0.6 |0.18 |0.28 |0.78 |351 |
|caring |0.5 |0.04 |0.07 |0.84 |135 |
|confusion |0.77 |0.07 |0.12 |0.9 |153 |
|curiosity |0.51 |0.34 |0.41 |0.92 |284 |
|desire |0.71 |0.18 |0.29 |0.88 |83 |
|disappointment|0.0 |0.0 |0.0 |0.76 |151 |
|disapproval |0.48 |0.1 |0.17 |0.85 |267 |
|disgust |0.94 |0.12 |0.22 |0.9 |123 |
|embarrassment |0.0 |0.0 |0.0 |0.84 |37 |
|excitement |0.81 |0.2 |0.33 |0.88 |103 |
|fear |0.73 |0.42 |0.54 |0.92 |78 |
|gratitude |0.95 |0.89 |0.92 |0.99 |352 |
|grief |0.0 |0.0 |0.0 |0.76 |6 |
|joy |0.66 |0.52 |0.58 |0.93 |161 |
|love |0.8 |0.79 |0.79 |0.97 |238 |
|nervousness |0.0 |0.0 |0.0 |0.81 |23 |
|optimism |0.67 |0.41 |0.51 |0.89 |186 |
|pride |0.0 |0.0 |0.0 |0.89 |16 |
|realization |0.0 |0.0 |0.0 |0.7 |145 |
|relief |0.0 |0.0 |0.0 |0.84 |11 |
|remorse |0.59 |0.71 |0.65 |0.99 |56 |
|sadness |0.77 |0.37 |0.5 |0.89 |156 |
|surprise |0.59 |0.35 |0.44 |0.88 |141 |
|neutral |0.64 |0.58 |0.61 |0.81 |1787 |
|micro avg |0.68 |0.43 |0.53 |0.93 |6329 |
|macro avg |0.51 |0.29 |0.33 |0.87 |6329 |
|weighted avg |0.62 |0.43 |0.48 |0.86 |6329 |