Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for German Hate Speech Classifier

Model Details

Introduction

This model was developed to explore the potential of German language models in multi-class classification of hate speech in German online journals. It is a fine-tuned version of the GBERT model from (Chan, Schweter, and Möller, 2020).

Dataset

The dataset used for training is a consolidation of three pre-existing German hate speech datasets:

  • RP (Assenmacher et al., 2021)
  • DeTox (Demus et al., 2022)
  • Twitter dataset (Glasenbach, 2022)

The combined dataset underwent cleaning to minimize biases and remove redundant data.

Performance

Our experiments delivered promising results, with the model reliably classifying comments into:

  • No Hate Speech
  • Other Hate Speech (Threat, Insult, Profanity)
  • Political Hate Speech
  • Racist Hate Speech
  • Sexist Hate Speech

The model achieved a macro F1-score of 0.775. However, to further reduce misclassifications, improvements are essential. Short comments are overproportionally classified as Sexist Hate Speech.

Downloads last month
14
Safetensors
Model size
110M params
Tensor type
I64
·
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.