cointegrated commited on
Commit
6bc8969
1 Parent(s): 77f6923

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -45,22 +45,22 @@ def text2toxicity(text, aggregate=True):
45
  return proba
46
 
47
  print(text2toxicity('я люблю нигеров', True))
48
- # 0.57240640889815
49
 
50
  print(text2toxicity('я люблю нигеров', False))
51
- # [9.9336821e-01 6.1555761e-03 1.2781911e-03 9.2758919e-04 5.6955177e-01]
52
 
53
  print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], True))
54
- # [0.5724064 0.20111847]
55
 
56
  print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], False))
57
- # [[9.9336821e-01 6.1555761e-03 1.2781911e-03 9.2758919e-04 5.6955177e-01]
58
- # [9.9828428e-01 1.1138428e-03 1.1492912e-03 4.6551935e-04 1.9974548e-01]]
59
  ```
60
 
61
  ## Training
62
 
63
- The model has been trained on the joint dataset of [OK ML Cup](https://cups.mail.ru/ru/tasks/1048) and [Babakov et.al.](https://arxiv.org/abs/2103.05345) with `Adam` optimizer, learning rate of `1e-5`, and batch size of `64` for `15` epochs. A text was considered inappropriate if its inappropritateness score was higher than 0.8, and appropriate - if it was lower than 0.2. The per-label ROC AUC on the dev set is:
64
  ```
65
  non-toxic : 0.9937
66
  insult : 0.9912
45
  return proba
46
 
47
  print(text2toxicity('я люблю нигеров', True))
48
+ # 0.9350118728093193
49
 
50
  print(text2toxicity('я люблю нигеров', False))
51
+ # [0.9715758 0.0180863 0.0045551 0.00189755 0.9331106 ]
52
 
53
  print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], True))
54
+ # [0.93501186 0.04156357]
55
 
56
  print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], False))
57
+ # [[9.7157580e-01 1.8086294e-02 4.5550885e-03 1.8975559e-03 9.3311059e-01]
58
+ # [9.9979788e-01 1.9048342e-04 1.5297388e-04 1.7452303e-04 4.1369814e-02]]
59
  ```
60
 
61
  ## Training
62
 
63
+ The model has been trained on the joint dataset of [OK ML Cup](https://cups.mail.ru/ru/tasks/1048) and [Babakov et.al.](https://arxiv.org/abs/2103.05345) with `Adam` optimizer, the learning rate of `1e-5`, and batch size of `64` for `15` epochs. A text was considered inappropriate if its inappropriateness score was higher than 0.8, and appropriate - if it was lower than 0.2. The per-label ROC AUC on the dev set is:
64
  ```
65
  non-toxic : 0.9937
66
  insult : 0.9912