Commit
•
6e306f6
1
Parent(s):
6ad70ed
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,18 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: other
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: fr # <-- my language
|
3 |
+
widget:
|
4 |
+
- text: "J'aime ta coiffure"
|
5 |
+
- text: "Va te faire foutre"
|
6 |
+
- text: "Quel mauvais temps, n'est-ce pas ?"
|
7 |
+
- text: "J'espère que tu vas mourir, connard !"
|
8 |
+
- text: "j'aime beaucoup ta veste"
|
9 |
+
|
10 |
license: other
|
11 |
---
|
12 |
+
This model was trained for toxicity labeling. Label_1 means TOXIC, Label_0 means NOT_TOXIC
|
13 |
+
|
14 |
+
The model was fine-tuned based off the CamemBERT language model https://huggingface.co/camembert-base .
|
15 |
+
|
16 |
+
The accuracy is 93% on the test split during training and 79% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
|
17 |
+
|
18 |
+
The model was finetuned on 32k sentences. The train data was the translations of the english data (around 30k sentences) from https://github.com/s-nlp/multilingual_detox with https://huggingface.co/Helsinki-NLP/opus-mt-en-fr and the data from the jigsaw dataset on kaggle https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/data .
|