EIStakovskii commited on
Commit
6e306f6
1 Parent(s): 6ad70ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -1,3 +1,18 @@
1
  ---
 
 
 
 
 
 
 
 
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
1
  ---
2
+ language: fr # <-- my language
3
+ widget:
4
+ - text: "J'aime ta coiffure"
5
+ - text: "Va te faire foutre"
6
+ - text: "Quel mauvais temps, n'est-ce pas ?"
7
+ - text: "J'espère que tu vas mourir, connard !"
8
+ - text: "j'aime beaucoup ta veste"
9
+
10
  license: other
11
  ---
12
+ This model was trained for toxicity labeling. Label_1 means TOXIC, Label_0 means NOT_TOXIC
13
+
14
+ The model was fine-tuned based off the CamemBERT language model https://huggingface.co/camembert-base .
15
+
16
+ The accuracy is 93% on the test split during training and 79% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
17
+
18
+ The model was finetuned on 32k sentences. The train data was the translations of the english data (around 30k sentences) from https://github.com/s-nlp/multilingual_detox with https://huggingface.co/Helsinki-NLP/opus-mt-en-fr and the data from the jigsaw dataset on kaggle https://www.kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification/data .