cmarkea
/

bloomz-3b-guardrail

@@ -23,34 +23,35 @@ This kind of modeling can be ideal for monitoring and controlling the output of
 Training
 --------
-The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a toxicity severity graduation. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/approach/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). Since the scores represent severity graduations, regression was preferred using the following loss function:
 $$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
 with
-$$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\vert\mathrm{score}_{i,o}-\sigma(\mathrm{logit}_{i,o})\vert$$
 Where sigma is the sigmoid function and O represents the set of learning observations.
 Benchmark
 ---------
-As the scores range from 0 to 1, a performance measure such as RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
 | Model                                                                         | Language | Obsecene (x100)         | Sexual explicit (x100)        | Identity attack (x100)        | Insult (x100)        | Threat (x100)        | Mean |
-|-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
-| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French   | 62                      | 73                            | 73                            | 68                   | 61                   | 67   |
-| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English  | 63                      | 61                            | 63                            | 67                   | 55                   | 62   |
-| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | French   | 72                      | 82                            | 80                            | 78                   | 77                   | 78   |
-| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | English  | 76                      | 78                            | 77                            | 75                   | 79                   | 77   |
 With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
-Now we will focus on the MAE (Mean Absolute Error) score to measure the average gap of the estimation error with the error standard deviation.
-| Model                                                                         | Language | Obsecene           | Sexual explicit       | Identity attack      | Insult             | Threat             | Mean               |
-|-------------------------------------------------------------------------------|----------|:------------------:|-----------------------|----------------------|--------------------|--------------------|--------------------|
-| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French   | 0.06 &plusmn; 0.09 | 0.03 &plusmn; 0.07    | 0.03 &plusmn; 0.07   | 0.13 &plusmn; 0.13 | 0.04 &plusmn; 0.06 | 0.06 &plusmn; 0.08 |
-| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English  | 0.06 &plusmn; 0.09 | 0.03 &plusmn; 0.08    | 0.03 &plusmn; 0.08   | 0.14 &plusmn; 0.13 | 0.04 &plusmn; 0.07 | 0.06 &plusmn; 0.09 |
-| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | French   | 0.05 &plusmn; 0.08 | 0.02 &plusmn; 0.06    | 0.02 &plusmn; 0.06   | 0.11 &plusmn; 0.11 | 0.03 &plusmn; 0.05 | 0.05 &plusmn; 0.07 |
-| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | English  | 0.05 &plusmn; 0.08 | 0.03 &plusmn; 0.07    | 0.02 &plusmn; 0.06   | 0.12 &plusmn; 0.11 | 0.03 &plusmn; 0.05 | 0.05 &plusmn; 0.07 |
 How to Use Blommz-3b-guardrail
 --------------------------------

 Training
 --------
+The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a probablity toxicity severity. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/approach/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). As the score represents the probability of a toxicity mode, an optimization goal of cross-entropy type has been chosen:
 $$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
 with
+$$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\mathrm{score}_{i,o}\log(\sigma(\mathrm{logit}_{i,o}))$$
 Where sigma is the sigmoid function and O represents the set of learning observations.
 Benchmark
 ---------
+Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
 | Model                                                                         | Language | Obsecene (x100)         | Sexual explicit (x100)        | Identity attack (x100)        | Insult (x100)        | Threat (x100)        | Mean |
+|------------------------------------------------------------------------------:|:---------|:-----------------------:|:-----------------------------:|:-----------------------------:|:--------------------:|:--------------------:|:----:|
+| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French   | 64                      | 74                            | 72                            | 70                   | 58                   | 68   |
+| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English  | 63                      | 63                            | 62                            | 70                   | 51                   | 62   |
+| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | French   | 71                      | 82                            | 84                            | 77                   | 77                   | 78   |
+| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | English  | 74                      | 76                            | 79                            | 76                   | 79                   | 77   |
 With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
+Opting for the maximum of different modes results in a score extremely close to the target toxicity of the original dataset, with a correlation of 0.976 and a mean absolute error of 0.013&plusmn;0.04. Therefore, this approach serves as a robust approximation for evaluating the overall performance of the model, transcending rare toxicity modes. Taking a toxicity threshold &ge; 0.5 to create the target, we have 240 positive cases out of 730 observations. Consequently, we will determine the Precision-Recall AUC, ROC AUC, accuracy, and the F1-score.
+| Model                                                                         | Language | PR AUC (%)    | ROC AUC (%)       | Accuracy (%)       | F1-score (%)    |
+|------------------------------------------------------------------------------:|:---------|:-------------:|:-----------------:|:------------------:|:---------------:|
+| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French   | 77            | 85                | 78                 | 60              |
+| [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English  | 77            | 84                | 79                 | 62              |
+| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | French   | 82            | 89                | 84                 | 72              |
+| [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail)     | English  | 80            | 88                | 82                 | 70              |
 How to Use Blommz-3b-guardrail
 --------------------------------