Cyrile commited on
Commit
a98d775
1 Parent(s): f38485b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -23,34 +23,35 @@ This kind of modeling can be ideal for monitoring and controlling the output of
23
  Training
24
  --------
25
 
26
- The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a toxicity severity graduation. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/approach/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). Since the scores represent severity graduations, regression was preferred using the following loss function:
27
  $$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
28
  with
29
- $$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\vert\mathrm{score}_{i,o}-\sigma(\mathrm{logit}_{i,o})\vert$$
30
  Where sigma is the sigmoid function and O represents the set of learning observations.
31
 
32
  Benchmark
33
  ---------
34
 
35
- As the scores range from 0 to 1, a performance measure such as RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
36
 
37
  | Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
38
- |-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
39
- | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 62 | 73 | 73 | 68 | 61 | 67 |
40
- | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 61 | 63 | 67 | 55 | 62 |
41
- | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 72 | 82 | 80 | 78 | 77 | 78 |
42
- | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 76 | 78 | 77 | 75 | 79 | 77 |
43
 
44
  With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
45
 
46
- Now we will focus on the MAE (Mean Absolute Error) score to measure the average gap of the estimation error with the error standard deviation.
47
 
48
- | Model | Language | Obsecene | Sexual explicit | Identity attack | Insult | Threat | Mean |
49
- |-------------------------------------------------------------------------------|----------|:------------------:|-----------------------|----------------------|--------------------|--------------------|--------------------|
50
- | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 0.06 ± 0.09 | 0.03 ± 0.07 | 0.03 ± 0.07 | 0.13 ± 0.13 | 0.04 ± 0.06 | 0.06 ± 0.08 |
51
- | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 0.06 ± 0.09 | 0.03 ± 0.08 | 0.03 ± 0.08 | 0.14 ± 0.13 | 0.04 ± 0.07 | 0.06 ± 0.09 |
52
- | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 0.05 ± 0.08 | 0.02 ± 0.06 | 0.02 ± 0.06 | 0.11 ± 0.11 | 0.03 ± 0.05 | 0.05 ± 0.07 |
53
- | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 0.05 ± 0.08 | 0.03 ± 0.07 | 0.02 ± 0.06 | 0.12 ± 0.11 | 0.03 ± 0.05 | 0.05 ± 0.07 |
 
 
54
 
55
  How to Use Blommz-3b-guardrail
56
  --------------------------------
 
23
  Training
24
  --------
25
 
26
+ The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a probablity toxicity severity. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/approach/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). As the score represents the probability of a toxicity mode, an optimization goal of cross-entropy type has been chosen:
27
  $$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
28
  with
29
+ $$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\mathrm{score}_{i,o}\log(\sigma(\mathrm{logit}_{i,o}))$$
30
  Where sigma is the sigmoid function and O represents the set of learning observations.
31
 
32
  Benchmark
33
  ---------
34
 
35
+ Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 730 comments not seen during training.
36
 
37
  | Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
38
+ |------------------------------------------------------------------------------:|:---------|:-----------------------:|:-----------------------------:|:-----------------------------:|:--------------------:|:--------------------:|:----:|
39
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 64 | 74 | 72 | 70 | 58 | 68 |
40
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 63 | 62 | 70 | 51 | 62 |
41
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 71 | 82 | 84 | 77 | 77 | 78 |
42
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 74 | 76 | 79 | 76 | 79 | 77 |
43
 
44
  With a correlation of approximately 65 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
45
 
 
46
 
47
+ Opting for the maximum of different modes results in a score extremely close to the target toxicity of the original dataset, with a correlation of 0.976 and a mean absolute error of 0.013±0.04. Therefore, this approach serves as a robust approximation for evaluating the overall performance of the model, transcending rare toxicity modes. Taking a toxicity threshold ≥ 0.5 to create the target, we have 240 positive cases out of 730 observations. Consequently, we will determine the Precision-Recall AUC, ROC AUC, accuracy, and the F1-score.
48
+
49
+ | Model | Language | PR AUC (%) | ROC AUC (%) | Accuracy (%) | F1-score (%) |
50
+ |------------------------------------------------------------------------------:|:---------|:-------------:|:-----------------:|:------------------:|:---------------:|
51
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 77 | 85 | 78 | 60 |
52
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 77 | 84 | 79 | 62 |
53
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 82 | 89 | 84 | 72 |
54
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 80 | 88 | 82 | 70 |
55
 
56
  How to Use Blommz-3b-guardrail
57
  --------------------------------