Cyrile commited on
Commit
9f4db8b
1 Parent(s): 78a958b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md CHANGED
@@ -1,3 +1,72 @@
1
  ---
2
  license: bigscience-bloom-rail-1.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bigscience-bloom-rail-1.0
3
+ language:
4
+ - fr
5
+ - en
6
+ pipeline_tag: text-classification
7
  ---
8
+
9
+
10
+ Bloomz-3b-guardrail
11
+ ---------------------
12
+
13
+ We introduce the Bloomz-3b-guardrail model, which is a fine-tuning of the [Bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) model. This model is designed to detect the toxicity of a text in five modes:
14
+
15
+ * Obscene: Content that is offensive, indecent, or morally inappropriate, especially in relation to social norms or standards of decency.
16
+ * Sexual explicit: Content that presents explicit sexual aspects in a clear and detailed manner.
17
+ * Identity attack: Content that aims to attack, denigrate, or harass someone based on their identity, especially related to characteristics such as race, gender, sexual orientation, religion, ethnic origin, or other personal aspects.
18
+ * Insult: Offensive, disrespectful, or hurtful content used to attack or denigrate a person.
19
+ * Threat: Content that presents a direct threat to an individual.
20
+
21
+ Training
22
+ --------
23
+
24
+ The training dataset consists of 500k examples of comments in English and 500k comments in French (translated by Google Translate), each annotated with a toxicity severity gradient. The dataset used is provided by [Jigsaw](https://jigsaw.google.com/) as part of a Kaggle competition : [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data). Since the scores represent severity gradients, regression was preferred using the following loss function:
25
+ $$loss=l_{\mathrm{obscene}}+l_{\mathrm{sexual\_explicit}}+l_{\mathrm{identity\_attack}}+l_{\mathrm{insult}}+l_{\mathrm{threat}}$$
26
+ with
27
+ $$l_i=\frac{1}{\vert\mathcal{O}\vert}\sum_{o\in\mathcal{O}}\vert\mathrm{score}_{i,o}-\sigma(\mathrm{logit}_{i,o})\vert$$
28
+ Where sigma is the sigmoid function and O represents the set of learning observations.
29
+
30
+ Benchmark
31
+ ---------
32
+
33
+ As the scores range from 0 to 1, a performance measure such as MAE or RMSE may be challenging to interpret. Therefore, Pearson's inter-correlation was chosen as a measure. Pearson's inter-correlation is a measure ranging from -1 to 1, where 0 represents no correlation, -1 represents perfect negative correlation, and 1 represents perfect positive correlation. The goal is to quantitatively measure the correlation between the model's scores and the scores assigned by judges for 750 comments not seen during training.
34
+
35
+ | Model | Language | Obsecene (x100) | Sexual explicit (x100) | Identity attack (x100) | Insult (x100) | Threat (x100) | Mean |
36
+ |-------------------------------------------------------------------------------|----------|:-----------------------:|-------------------------------|-------------------------------|----------------------|----------------------|------|
37
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | French | 62 | 73 | 73 | 68 | 61 | 67 |
38
+ | [Bloomz-560m-guardrail](https://huggingface.co/cmarkea/bloomz-560m-guardrail) | English | 63 | 61 | 63 | 67 | 55 | 62 |
39
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | French | 72 | 82 | 80 | 78 | 77 | 78 |
40
+ | [Bloomz-3b-guardrail](https://huggingface.co/cmarkea/bloomz-3b-guardrail) | English | 76 | 78 | 77 | 75 | 79 | 77 |
41
+
42
+ With a correlation of approximately 60 for the 560m model and approximately 80 for the 3b model, the output is highly correlated with the judges' scores.
43
+
44
+ How to Use Blommz-3b-guardrail
45
+ --------------------------------
46
+
47
+ The following example utilizes the API Pipeline of the Transformers library.
48
+
49
+ ```python
50
+ from transformers import pipeline
51
+
52
+ guardrail = pipeline("text-classification", "cmarkea/bloomz-3b-guardrail")
53
+
54
+ list_text = [...]
55
+ result = guardrail(
56
+ list_text,
57
+ return_all_scores=True, # Crucial for assessing all modalities of toxicity!
58
+ function_to_apply='sigmoid' # To ensure obtaining a score between 0 and 1!
59
+ )
60
+ ```
61
+
62
+ Citation
63
+ --------
64
+
65
+ ```bibtex
66
+ @online{DeBloomzGuard,
67
+ AUTHOR = {Cyrile Delestre},
68
+ URL = {https://huggingface.co/cmarkea/bloomz-3b-guardrail},
69
+ YEAR = {2023},
70
+ KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
71
+ }
72
+ ```