--- language: - en - nl - fr - pt - it - es - de - da - pl - af - multilingual datasets: - jigsaw_toxicity_pred metrics: - F1 Accuracy pipeline_type: text-classification widget: - text: this is a lovely message example_title: Example 1 multi_class: false - text: you are an idiot and you and your family should go back to your country example_title: Example 2 multi_class: false --- # citizenlab/distilbert-base-multilingual-cased-toxicity This is multilingual Distil-Bert model sequence classifier trained based on [JIGSAW Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) dataset. ## How to use it ```python from transformers import pipeline model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity" toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path) toxicity_classifier("this is a lovely message") > [{'label': 'not_toxic', 'score': 0.9954179525375366}] toxicity_classifier("you are an idiot and you and your family should go back to your country") > [{'label': 'toxic', 'score': 0.9948776960372925}] ``` ## Evaluation ### Accuracy ``` Accuracy Score = 0.9425 F1 Score (Micro) = 0.9450549450549449 F1 Score (Macro) = 0.8491432341169309 ```