cointegrated
/

rubert-tiny-toxicity

Text Classification

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Jul 25, 2021

Commit

6bc8969

·

1 Parent(s): 77f6923

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -45,22 +45,22 @@ def text2toxicity(text, aggregate=True):
     return proba
 print(text2toxicity('я люблю нигеров', True))
-# 0.57240640889815
 print(text2toxicity('я люблю нигеров', False))
-# [9.9336821e-01 6.1555761e-03 1.2781911e-03 9.2758919e-04 5.6955177e-01]
 print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], True))
-# [0.5724064  0.20111847]
 print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], False))
-# [[9.9336821e-01 6.1555761e-03 1.2781911e-03 9.2758919e-04 5.6955177e-01]
-#  [9.9828428e-01 1.1138428e-03 1.1492912e-03 4.6551935e-04 1.9974548e-01]]
 ```
 ## Training
-The model has been trained on the joint dataset of [OK ML Cup](https://cups.mail.ru/ru/tasks/1048) and [Babakov et.al.](https://arxiv.org/abs/2103.05345) with `Adam` optimizer, learning rate of `1e-5`, and batch size of `64` for `15` epochs. A text was considered inappropriate if its inappropritateness score was higher than 0.8, and appropriate - if it was lower than 0.2. The per-label ROC AUC on the dev set is:
 ```
 non-toxic  : 0.9937
 insult     : 0.9912

     return proba
 print(text2toxicity('я люблю нигеров', True))
+# 0.9350118728093193
 print(text2toxicity('я люблю нигеров', False))
+# [0.9715758  0.0180863  0.0045551  0.00189755 0.9331106 ]
 print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], True))
+# [0.93501186 0.04156357]
 print(text2toxicity(['я люблю нигеров', 'я люблю африканцев'], False))
+# [[9.7157580e-01 1.8086294e-02 4.5550885e-03 1.8975559e-03 9.3311059e-01]
+#  [9.9979788e-01 1.9048342e-04 1.5297388e-04 1.7452303e-04 4.1369814e-02]]
 ```
 ## Training
+The model has been trained on the joint dataset of [OK ML Cup](https://cups.mail.ru/ru/tasks/1048) and [Babakov et.al.](https://arxiv.org/abs/2103.05345) with `Adam` optimizer, the learning rate of `1e-5`, and batch size of `64` for `15` epochs. A text was considered inappropriate if its inappropriateness score was higher than 0.8, and appropriate - if it was lower than 0.2. The per-label ROC AUC on the dev set is:
 ```
 non-toxic  : 0.9937
 insult     : 0.9912