--- license: mit language: - ru metrics: - f1 - roc_auc - precision - recall pipeline_tag: text-classification tags: - sentiment-analysis - multi-class-classification - sentiment analysis - rubert - sentiment - bert - russian - multiclass - classification --- Модель [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) которая был fine-tuned на задачу __sentiment classification__ для коротких __Russian__ текстов. Задача представляет собой __multi-class classification__ со следующими метками: ```yaml 0: neutral 1: positive 2: negative ``` ## Usage ```python from transformers import pipeline model = pipeline(model="r1char9/rubert-base-cased-russian-sentiment") model("Привет, ты мне нравишься!") # [{'label': 'positive', 'score': 0.8220236897468567}] ``` ## Dataset Модель была натренирована на данных: - Kaggle Russian News Dataset - Linis Crowd 2015 - Linis Crowd 2016 - RuReviews - RuSentiment ```yaml tokenizer.max_length: 256 batch_size: 32 optimizer: adam lr: 0.00001 weight_decay: 0 epochs: 2 ``` Train/validation/test splits are 80%/10%/10%. ## Eval results (on test split) | |neutral|positive|negative|macro avg|weighted avg| |---------|-------|--------|--------|---------|------------| |precision|0.72 |0.85 |0.75 |0.77 |0.77 | |recall |0.75 |0.84 |0.72 |0.77 |0.77 | |f1-score |0.73 |0.84 |0.73 |0.77 |0.77 | |auc-roc |0.86 |0.96 |0.92 |0.91 |0.91 | |support |5196 |3831 |3599 |12626 |12626 |