Update README.md
Browse files
README.md
CHANGED
@@ -23,34 +23,35 @@ This kind of modeling can be ideal for monitoring and controlling the output of
|
|
23 |
Training
|
24 |
--------
|
25 |
|
26 |
-
The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a toxicity severity
|
27 |
$$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
|
28 |
with
|
29 |
-
$$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\
|
30 |
Where sigma is the sigmoid function and O represents the set of learning observations.
|
31 |
|
32 |
Benchmark
|
33 |
---------
|
34 |
|
35 |
-
|
36 |
|
37 |
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
|
38 |
-
|
39 |
-
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French |
|
40 |
-
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 |
|
41 |
-
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French |
|
42 |
-
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English |
|
43 |
|
44 |
With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
|
45 |
|
46 |
-
Now we will focus on the MAE (Mean Absolute Error) score to measure the average gap of the estimation error with the error standard deviation.
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
|
|
51 |
-
|
52 |
-
| [Bloomz-
|
53 |
-
| [Bloomz-
|
|
|
|
|
54 |
|
55 |
How to Use Blommz-3b-guardrail
|
56 |
--------------------------------
|
|
|
23 |
Training
|
24 |
--------
|
25 |
|
26 |
+
The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a probablity toxicity severity. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/approach/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). As the score represents the probability of a toxicity mode, an optimization goal of cross-entropy type has been chosen:
|
27 |
$$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
|
28 |
with
|
29 |
+
$$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\mathrm{score}_{i,o}\log(\sigma(\mathrm{logit}_{i,o}))$$
|
30 |
Where sigma is the sigmoid function and O represents the set of learning observations.
|
31 |
|
32 |
Benchmark
|
33 |
---------
|
34 |
|
35 |
+
Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
|
36 |
|
37 |
| Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
|
38 |
+
|------------------------------------------------------------------------------:|:---------|:-----------------------:|:-----------------------------:|:-----------------------------:|:--------------------:|:--------------------:|:----:|
|
39 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 64 | 74 | 72 | 70 | 58 | 68 |
|
40 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 63 | 62 | 70 | 51 | 62 |
|
41 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 71 | 82 | 84 | 77 | 77 | 78 |
|
42 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 74 | 76 | 79 | 76 | 79 | 77 |
|
43 |
|
44 |
With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
|
45 |
|
|
|
46 |
|
47 |
+
Opting for the maximum of different modes results in a score extremely close to the target toxicity of the original dataset, with a correlation of 0.976 and a mean absolute error of 0.013±0.04. Therefore, this approach serves as a robust approximation for evaluating the overall performance of the model, transcending rare toxicity modes. Taking a toxicity threshold ≥ 0.5 to create the target, we have 240 positive cases out of 730 observations. Consequently, we will determine the Precision-Recall AUC, ROC AUC, accuracy, and the F1-score.
|
48 |
+
|
49 |
+
| Model | Language | PR AUC (%) | ROC AUC (%) | Accuracy (%) | F1-score (%) |
|
50 |
+
|------------------------------------------------------------------------------:|:---------|:-------------:|:-----------------:|:------------------:|:---------------:|
|
51 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 77 | 85 | 78 | 60 |
|
52 |
+
| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 77 | 84 | 79 | 62 |
|
53 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 82 | 89 | 84 | 72 |
|
54 |
+
| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 80 | 88 | 82 | 70 |
|
55 |
|
56 |
How to Use Blommz-3b-guardrail
|
57 |
--------------------------------
|