language: | |
- en | |
- nl | |
- fr | |
- pt | |
- it | |
- es | |
- de | |
- da | |
- pl | |
- af | |
- multilingual | |
datasets: | |
- jigsaw_toxicity_pred | |
metrics: | |
- F1 Accuracy | |
pipeline_type: text-classification | |
widget: | |
- text: this is a lovely message | |
example_title: Example 1 | |
multi_class: false | |
- text: you are an idiot and you and your family should go back to your country | |
example_title: Example 2 | |
multi_class: false | |
# citizenlab/distilbert-base-multilingual-cased-toxicity | |
This is multilingual Distil-Bert model sequence classifier trained based on [JIGSAW Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) dataset. | |
## How to use it | |
```python | |
from transformers import pipeline | |
model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity" | |
toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path) | |
toxicity_classifier("this is a lovely message") | |
> [{'label': 'not_toxic', 'score': 0.9954179525375366}] | |
toxicity_classifier("you are an idiot and you and your family should go back to your country") | |
> [{'label': 'toxic', 'score': 0.9948776960372925}] | |
``` | |
## Evaluation | |
### Accuracy | |
``` | |
Accuracy Score = 0.9425 | |
F1 Score (Micro) = 0.9450549450549449 | |
F1 Score (Macro) = 0.8491432341169309 | |
``` |