NiGuLa's picture
Update README.md
1498514
metadata
language:
  - ru
tags:
  - toxic comments classification
licenses:
  - cc-by-nc-sa

Bert-based classifier (finetuned from Conversational Rubert)trained on merge of Russian Language Toxic Comments dataset collected from 2ch.hk and Toxic Russian Comments dataset collected from ok.ru.

The datasets were merged, shuffled, and split into train,dev,test splits in 80-10-10 proportion. The metrics obtained from test dataset is as follows

precision recall f1-score support
0 0.98 0.99 0.98 21384
1 0.94 0.92 0.93 4886
accuracy 0.97 26270 0.94
macro avg 0.96 0.96 0.96 26270
weighted avg 0.97 0.97 0.97 26270