sunny3's picture
initial commit
9f6d1c8
---
language: ["ru"]
tags:
- russian
- classification
- sentiment
- multiclass
widget:
- text: "����� ������� ��� ���� �������� ����!"
---
## Sentiment model based on rubert-base-cased-conversational
This model was initialized with [rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) weights and trained on a batch of datasets collected by [Smetanin](https://duckduckgo.com), using the same training sampling presented in [this wonderful work](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced). This approach allows for a uniform distribution among different datasets and three classes of sentiment labels: negative, neutral, and positive. Datasets were prepared by David Dale and are hosted [here](https://drive.google.com/file/d/1dir_lixYfReDXxRS5oGGljH8T_f7vVqm/view).
I chose rubert-base-cased-conversational weights because, according to Smetanin's work, this model ranks first among all other multilingual and popular Russian language models with BERT base architecture.
### Training and Testing Details
This model was trained and tested using the code and hyperparameters from the [rubert-tiny-sentiment-balanced](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) work.
### Labels
There are only three labels: negative - 0, neutral - 1, positive - 2
## Results
It outperforms rubert-tiny-sentiment-balanced on four datasets, underperforms on one (linis), and has the same performance on mokoron and rureviews. See [this](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) for the comparison.
| Source | Macro F1 |
| ----------- | ----------- |
| SentiRuEval2016_banks | 0.88 |
| SentiRuEval2016_tele | 0.79 |
| kaggle_news | 0.73 |
| linis | 0.46 |
| mokoron | 0.98 |
| rureviews | 0.77 |
| rusentiment | 0.74 |