seara
/

rubert-base-cased-russian-emotion-detection-cedr

Model card Files Files and versions Community

rubert-base-cased-russian-emotion-detection-cedr / README.md

seara's picture

Update README.md

426cf82 verified 10 days ago

|

history blame contribute delete

2.31 kB

	---
	license: mit
	language:
	- ru
	metrics:
	- f1
	- roc_auc
	- precision
	- recall
	pipeline_tag: text-classification
	tags:
	- sentiment-analysis
	- multi-label-classification
	- sentiment analysis
	- rubert
	- sentiment
	- bert
	- tiny
	- russian
	- multilabel
	- classification
	- emotion-classification
	- emotion-recognition
	- emotion
	- emotion-detection
	datasets:
	- cedr
	---

	This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __emotion classification__ of short __Russian__ texts.
	The task is a __multi-label classification__ with the following labels:

	```yaml
	0: no_emotion
	1: joy
	2: sadness
	3: surprise
	4: fear
	5: anger
	```

	Label to Russian label:

	```yaml
	no_emotion: нет эмоции
	joy: радость
	sadness: грусть
	surprise: удивление
	fear: страх
	anger: злость
	```

	## Usage

	```python
	from transformers import pipeline
	model = pipeline(model="seara/rubert-base-cased-cedr-russian-emotion")
	model("Привет, ты мне нравишься!")
	# [{'label': 'joy', 'score': 0.9388909935951233}]
	```

	## Dataset

	This model was trained on [CEDR dataset](https://huggingface.co/datasets/cedr).

	An overview of the training data can be found in it's [Hugging Face card](https://huggingface.co/datasets/cedr)
	or in the source [article](https://www.sciencedirect.com/science/article/pii/S1877050921013247).

	## Training

	Training were done in this [project](https://github.com/searayeah/bert-russian-sentiment-emotion) with this parameters:

	```yaml
	tokenizer.max_length: null
	batch_size: 64
	optimizer: adam
	lr: 0.00001
	weight_decay: 0
	num_epochs: 5
	```

	## Eval results (on test split)

	\| \|no_emotion\|joy \|sadness\|surprise\|fear \|anger\|micro avg\|macro avg\|weighted avg\|
	\|---------\|----------\|------\|-------\|--------\|-------\|-----\|---------\|---------\|------------\|
	\|precision\|0.87 \|0.84 \|0.85 \|0.74 \|0.7 \|0.66 \|0.83 \|0.78 \|0.83 \|
	\|recall \|0.84 \|0.86 \|0.82 \|0.71 \|0.74 \|0.33 \|0.79 \|0.72 \|0.79 \|
	\|f1-score \|0.86 \|0.85 \|0.84 \|0.72 \|0.72 \|0.44 \|0.81 \|0.74 \|0.8 \|
	\|auc-roc \|0.95 \|0.97 \|0.96 \|0.94 \|0.93 \|0.86 \|0.95 \|0.93 \|0.95 \|
	\|support \|734 \|353 \|379 \|170 \|141 \|125 \|1902 \|1902 \|1902 \|