--- language: ["ru"] tags: - russian - classification - sentiment - multiclass widget: - text: "Мне очень жаль" --- ## Sentiment model based on rubert-base-cased-conversational This model was initialized with [rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) weights and trained on a batch of datasets collected by [Smetanin](https://duckduckgo.com), using the same training sampling presented in [this wonderful work](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced). This approach allows for a uniform distribution among different datasets and three classes of sentiment labels: negative, neutral, and positive. Datasets were prepared by David Dale and are hosted [here](https://drive.google.com/file/d/1dir_lixYfReDXxRS5oGGljH8T_f7vVqm/view). I chose rubert-base-cased-conversational weights because, according to Smetanin's work, this model ranks first among all other multilingual and popular Russian language models with BERT base architecture. ### Training and Testing Details This model was trained and tested using the code and hyperparameters from the [rubert-tiny-sentiment-balanced](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) work. ### Labels There are only three labels: negative - 0, neutral - 1, positive - 2 ## Results It outperforms rubert-tiny-sentiment-balanced on four datasets, underperforms on one (linis), and has the same performance on mokoron and rureviews. See [this](https://huggingface.co/cointegrated/rubert-tiny-sentiment-balanced) for the comparison. | Source | Macro F1 | | ----------- | ----------- | | SentiRuEval2016_banks | 0.88 | | SentiRuEval2016_tele | 0.79 | | kaggle_news | 0.73 | | linis | 0.46 | | mokoron | 0.98 | | rureviews | 0.77 | | rusentiment | 0.74 |