ml6team
/

distilbert-base-german-cased-toxic-comments

@@ -20,4 +20,88 @@ widget:
 # German Toxic Comment Classification
-The model was trained on the [GermEval21](https://github.com/germeval2021toxic/SharedTask/tree/main/Data%20Sets) and IWG Hatespeech dataset. [[paper](https://arxiv.org/pdf/1701.08118.pdf), [dataset](https://github.com/UCSM-DUE/IWG_hatespeech_public)]

 # German Toxic Comment Classification
+## Model Description
+This model was created with the purpose to detect toxic or potentially harmful comments.
+For this model, we fine-tuned a German DistilBERT model [distilbert-base-german-cased](https://huggingface.co/distilbert-base-german-cased) on a combination of five German datasets containing toxicity, profanity, offensive, or hate speech.
+## Intended Uses & Limitations
+This model can be used to detect toxicity in German comments.
+However, the definition of toxicity is vague and the model might not be able to detect all instances of toxicity.
+It will not be able to detect toxicity in languages other than German.
+## How to Use
+```python
+from transformers import pipeline
+model_hub_url = 'https://huggingface.co/ml6team/distilbert-base-german-cased-toxic-comments'
+model_name = 'ml6team/distilbert-base-german-cased-toxic-comments'
+toxicity_pipeline = pipeline('text-classification', model=model_name, tokenizer=model_name)
+comment = "Ein harmloses Beispiel"
+result = toxicity_pipeline(comment)[0]
+print(f"Comment: {comment}\nLabel: {result['label']}, score: {result['score']}")
+```
+## Limitations and Bias
+The model was trained on a combinations of datasets that contain examples gathered from different social networks and internet communities. This only represents a narrow subset of possible instances of toxicity and instances in other domains might not be detected reliably.
+## Training Data
+The training dataset combines the following five datasets:
+* GermEval18 [[dataset](https://github.com/uds-lsv/GermEval-2018-Data)]
+    * Labels: abuse, profanity, toxicity
+* GermEval21 [[dataset](https://github.com/germeval2021toxic/SharedTask/tree/main/Data%20Sets)]
+    * Labels: toxicity
+* IWG Hatespeech dataset [[paper](https://arxiv.org/pdf/1701.08118.pdf), [dataset](https://github.com/UCSM-DUE/IWG_hatespeech_public)]
+    * Labels: hate speech
+* Detecting Offensive Statements Towards Foreigners in Social Media (2017) by Breitschneider and Peters [[dataset](http://ub-web.de/research/)]
+    * Labels: hate
+* HASOC: 2019 Hate Speech and Offensive Content [[dataset](https://hasocfire.github.io/hasoc/2019/index.html)]
+    * Labels: offensive, profanity, hate
+The datasets contains different labels ranging from profanity, over hate speech to toxicity. In the combined dataset these labels were subsumed as `toxic` and `non-toxic` and contains 23,515 examples in total.
+Note that the datasets vary substantially in the number of examples.
+## Training Procedure
+The training and test set were created using either the predefined train/test splits where available and otherwise 80% of the examples for training and 20% for testing. This resulted in in 17,072 training examples and 6,443 test examples.
+The model was trained for 2 epochs with the following arguments:
+```python
+training_args = TrainingArguments(
+    per_device_train_batch_size=batch_size,
+    per_device_eval_batch_size=batch_size,
+    num_train_epochs=2,
+    evaluation_strategy="steps",
+    logging_strategy="steps",
+    logging_steps=100,
+    save_total_limit=5,
+    learning_rate=2e-5,
+    weight_decay=0.01,
+    metric_for_best_model='accuracy',
+    load_best_model_at_end=True
+)
+```
+## Evaluation Results
+Model evaluation was done on 1/10th of the dataset, which served as the test dataset.
+| Accuracy | F1 Score |  Recall  |  Precision  |
+| -------- | -------- | -------- | ----------- |
+|    78.50 |    50.34 |    39.22 |       70.27 |