Edit model card

Danish Offensive Text Detection based on ELECTRA-small

This model is a fine-tuned version of Maltehb/aelaectra-danish-electra-small-cased on a dataset consisting of approximately 5 million Facebook comments on DR's public Facebook pages. The labels have been automatically generated using weak supervision, based on the Snorkel framework.

The model almost achieves SOTA results while being 20x smaller, on a test set consisting of 600 Facebook comments annotated using majority vote by three annotators, of which 35.8% were labelled as offensive:

Model Precision Recall F1-score F2-score
alexandrainst/da-offensive-detection-base 74.81% 89.77% 81.61% 86.32%
alexandrainst/da-offensive-detection-small (this) 74.13% 89.30% 81.01% 85.79%
A&ttack 97.32% 50.70% 66.67% 56.07%
alexandrainst/da-hatespeech-detection-small 86.43% 56.28% 68.17% 60.50%
Guscode/DKbert-hatespeech-detection 75.41% 42.79% 54.60% 46.84%

Using the model

You can use the model simply by running the following:

>>> from transformers import pipeline
>>> offensive_text_pipeline = pipeline(model="alexandrainst/da-offensive-detection-small")
>>> offensive_text_pipeline("Din store idiot")
[{'label': 'Offensive', 'score': 0.9997463822364807}]

Processing multiple documents at the same time can be done as follows:

>>> offensive_text_pipeline(["Din store idiot", "ej hvor godt :)"])
[{'label': 'Offensive', 'score': 0.9997463822364807}, {'label': 'Not offensive', 'score': 0.9996451139450073}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • gradient_accumulation_steps: 1
  • total_train_batch_size: 32
  • seed: 4242
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • max_steps: 500000
  • fp16: True
  • eval_steps: 1000
  • early_stopping_patience: 100

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
5

Finetuned from

Space using alexandrainst/da-offensive-detection-small 1