Commit
•
ba685c8
1
Parent(s):
1c07abf
Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,8 @@ RuBERT-Toxic is a [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased)
|
|
17 |
| M-USE<sub>CNN</sub>-Toxic | 89.69% | 90.14% | 89.91% |
|
18 |
| M-USE<sub>Trans</sub>-Toxic | 90.85% | 91.92% | 91.35% |
|
19 |
|
|
|
|
|
20 |
|
21 |
## Toxic Comments Dataset
|
22 |
[Kaggle Russian Language Toxic Comments Dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments) is the collection of Russian-language annotated comments from [2ch](https://2ch.hk/) and [Pikabu](https://pikabu.ru/), which was published on Kaggle in 2019. It consists of 14412 comments, where 4826 texts were labelled as toxic, and 9586 were labelled as non-toxic. The average length of comments is ~175 characters; the minimum length is 21, and the maximum is 7403.
|
|
|
17 |
| M-USE<sub>CNN</sub>-Toxic | 89.69% | 90.14% | 89.91% |
|
18 |
| M-USE<sub>Trans</sub>-Toxic | 90.85% | 91.92% | 91.35% |
|
19 |
|
20 |
+
We fine-tuned two versions of Multilingual Universal Sentence Encoder, Multilingual Bidirectional Encoder Representations from Transformers and RuBERT for toxic comments detection in Russian. Fine-tuned RuBERT-Toxic achieved F<sub>1</sub> = 92.20%, demonstrating the best classification score.
|
21 |
+
|
22 |
|
23 |
## Toxic Comments Dataset
|
24 |
[Kaggle Russian Language Toxic Comments Dataset](https://www.kaggle.com/blackmoon/russian-language-toxic-comments) is the collection of Russian-language annotated comments from [2ch](https://2ch.hk/) and [Pikabu](https://pikabu.ru/), which was published on Kaggle in 2019. It consists of 14412 comments, where 4826 texts were labelled as toxic, and 9586 were labelled as non-toxic. The average length of comments is ~175 characters; the minimum length is 21, and the maximum is 7403.
|