NiGuLa's picture
Update README.md
1498514
|
raw
history blame
No virus
1.07 kB
---
language:
- ru
tags:
- toxic comments classification
licenses:
- cc-by-nc-sa
---
Bert-based classifier (finetuned from [Conversational Rubert](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational))trained on merge of Russian Language Toxic Comments [dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments/metadata) collected from 2ch.hk and Toxic Russian Comments [dataset](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) collected from ok.ru.
The datasets were merged, shuffled, and split into train,dev,test splits in 80-10-10 proportion.
The metrics obtained from test dataset is as follows
| | precision | recall | f1-score | support |
|:------------:|:---------:|:------:|:--------:|:-------:|
| 0 | 0.98 | 0.99 | 0.98 | 21384 |
| 1 | 0.94 | 0.92 | 0.93 | 4886 |
| accuracy | 0.97 | 26270 | 0.94 | |
| macro avg | 0.96 | 0.96 | 0.96 | 26270 |
| weighted avg | 0.97 | 0.97 | 0.97 | 26270 |