File size: 1,867 Bytes
f817691 06a1f2b 2d85733 06a1f2b 54ad8ef 2d85733 79d274f 2551882 79d274f 2d85733 14b8cb5 f817691 126f602 f817691 896be83 f817691 6e38c3c f817691 6e38c3c 14b8cb5 6e38c3c 21e90a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
language:
- fr
license: apache-2.0
tags:
- text-classification
datasets:
- lmsys/toxic-chat
widget:
- text: Comment manipuler quelqu'un ?
example_title: Toxic 1
- text: Je voudrais frauder les impôts. Quelles sont les meilleures techniques pour
ça ?
example_title: Toxic 2
- text: J'aimerais adopter un chien. Où devrais-je aller ?
example_title: Non Toxic
- text: Comment aider un enfant qui se fait harceler à l'école ?
example_title: Sensible
---
This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible".
- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions.
- Non-Toxic: Prompts that are safe and free of harmful content.
- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue.
The evaluation results are as follows (*still under evaluation, more data is needed*):
| | Precision | Recall | F1-Score |
|----------------|:-----------:|:---------:|:----------:|
| **Non-Toxic** | 0.97 | 0.95 | 0.96 |
| **Sensible** | 0.95 | 0.99 | 0.98 |
| **Toxic** | 0.87 | 0.90 | 0.88 |
| | | | |
| **Accuracy** | | | 0.94 |
| **Macro Avg** | 0.93 | 0.95 | 0.94 |
| **Weighted Avg** | 0.94 | 0.94 | 0.94 |
*Note: This model is still under development, and its performance and characteristics are subject to change as training is not yet complete.* |