sismetanin commited on
Commit
1c07abf
1 Parent(s): 176626e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -1 +1,35 @@
1
- RuBERT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ru
4
+
5
+ tags:
6
+ - toxic comments classification
7
+ ---
8
+
9
+ ## RuBERT-Toxic
10
+ RuBERT-Toxic is a [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned on [Kaggle Russian Language Toxic Comments Dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments). You can find a detailed description of the data used and the fine-tuning process in [this article](http://doi.org/10.28995/2075-7182-2020-19-1149-1159).
11
+
12
+ | System | P | R | F<sub>1</sub> |
13
+ | ------------- | ------------- | ------------- | ------------- |
14
+ | MNB-Toxic | 87.01% | 81.22% | 83.21% |
15
+ | M-BERT<sub>Base</sub>-Toxic | 91.19% | 91.10% | 91.15% |
16
+ | <b>RuBERT-Toxic</b> | <b>91.91%</b> | <b>92.51%</b> | <b>92.20%</b> |
17
+ | M-USE<sub>CNN</sub>-Toxic | 89.69% | 90.14% | 89.91% |
18
+ | M-USE<sub>Trans</sub>-Toxic | 90.85% | 91.92% | 91.35% |
19
+
20
+
21
+ ## Toxic Comments Dataset
22
+ [Kaggle Russian Language Toxic Comments Dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments) is the collection of Russian-language annotated comments from [2ch](https://2ch.hk/) and [Pikabu](https://pikabu.ru/), which was published on Kaggle in 2019. It consists of 14412 comments, where 4826 texts were labelled as toxic, and 9586 were labelled as non-toxic. The average length of comments is ~175 characters; the minimum length is 21, and the maximum is 7403.
23
+
24
+ ## Citation
25
+ If you find this repository helpful, feel free to cite our publication:
26
+
27
+ ```
28
+ @INPROCEEDINGS{Smetanin2020Toxic,
29
+ author={Sergey Smetanin},
30
+ booktitle={Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020”},
31
+ title={Toxic Comments Detection in Russian},
32
+ year={2020},
33
+ doi={10.28995/2075-7182-2020-19-1149-1159}
34
+ }
35
+ ```