NiGuLa commited on
Commit
0f009fd
1 Parent(s): 3a7a996

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -2
README.md CHANGED
@@ -9,15 +9,47 @@ licenses:
9
  - cc-by-nc-sa
10
  ---
11
 
12
-
13
  ## General concept of the model
14
 
15
- This model is trained on the dataset of sensitive topics of the Russian language. The concept of sensitive topics is described [in this article ](https://arxiv.org/abs/2103.05345) presented at the workshop for Balto-Slavic NLP at the EACL-2021 conference. Please note that this article describes the first version of the dataset, while the model is trained on the extended version of the dataset open-sourced on our [GitHub](https://github.com/skoltech-nlp/inappropriate-sensitive-topics/blob/main/Version2/appropriateness/Appropriateness.csv). The properties of the dataset is the same as the one described in the article, the only difference is the size.
 
16
 
17
  ## Instructions
18
 
19
  The model predicts combinations of 18 sensitive topics described in the [article](https://arxiv.org/abs/2103.05345). You can find step-by-step instructions for using the model [here](https://github.com/skoltech-nlp/inappropriate-sensitive-topics/blob/main/Version2/sensitive_topics/Inference.ipynb)
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Licensing Information
22
 
23
  [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
 
9
  - cc-by-nc-sa
10
  ---
11
 
 
12
  ## General concept of the model
13
 
14
+ This model is trained on the dataset of sensitive topics of the Russian language. The concept of sensitive topics is described [in this article ](https://arxiv.org/abs/2103.05345) presented at the workshop for Balto-Slavic NLP at the EACL-2021 conference. Please note that this article describes the first version of the dataset, while the model is trained on the extended version of the dataset open-sourced on our [GitHub](https://github.com/skoltech-nlp/inappropriate-sensitive-topics/blob/main/Version2/sensitive_topics/sensitive_topics.csv). The properties of the dataset is the same as the one described in the article, the only difference is the size.
15
+
16
 
17
  ## Instructions
18
 
19
  The model predicts combinations of 18 sensitive topics described in the [article](https://arxiv.org/abs/2103.05345). You can find step-by-step instructions for using the model [here](https://github.com/skoltech-nlp/inappropriate-sensitive-topics/blob/main/Version2/sensitive_topics/Inference.ipynb)
20
 
21
+
22
+ ## Metrics
23
+
24
+ The dataset partially manually labeled samples and partially semi-automatically labeled samples. Learn more in our article. We tested the performance of the classifier only on the part of manually labeled data that is why some topics are not well represented in the test set.
25
+
26
+
27
+ | | precision | recall | f1-score | support |
28
+ |-------------------|-----------|--------|----------|---------|
29
+ | offline_crime | 0.65 | 0.55 | 0.6 | 132 |
30
+ | online_crime | 0.5 | 0.46 | 0.48 | 37 |
31
+ | drugs | 0.87 | 0.9 | 0.88 | 87 |
32
+ | gambling | 0.5 | 0.67 | 0.57 | 6 |
33
+ | pornography | 0.73 | 0.59 | 0.65 | 204 |
34
+ | prostitution | 0.75 | 0.69 | 0.72 | 91 |
35
+ | slavery | 0.72 | 0.72 | 0.73 | 40 |
36
+ | suicide | 0.33 | 0.29 | 0.31 | 7 |
37
+ | terrorism | 0.68 | 0.57 | 0.62 | 47 |
38
+ | weapons | 0.89 | 0.83 | 0.86 | 138 |
39
+ | body_shaming | 0.9 | 0.67 | 0.77 | 109 |
40
+ | health_shaming | 0.84 | 0.55 | 0.66 | 108 |
41
+ | politics | 0.68 | 0.54 | 0.6 | 241 |
42
+ | racism | 0.81 | 0.59 | 0.68 | 204 |
43
+ | religion | 0.94 | 0.72 | 0.81 | 102 |
44
+ | sexual_minorities | 0.69 | 0.46 | 0.55 | 102 |
45
+ | sexism | 0.66 | 0.64 | 0.65 | 132 |
46
+ | social_injustice | 0.56 | 0.37 | 0.45 | 181 |
47
+ | none | 0.62 | 0.67 | 0.64 | 250 |
48
+ | micro avg | 0.72 | 0.61 | 0.66 | 2218 |
49
+ | macro avg | 0.7 | 0.6 | 0.64 | 2218 |
50
+ | weighted avg | 0.73 | 0.61 | 0.66 | 2218 |
51
+ | samples avg | 0.75 | 0.66 | 0.68 | 2218 |
52
+
53
  ## Licensing Information
54
 
55
  [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].