Danish Offensive Text Detection based on ELECTRA-small

This model is a fine-tuned version of Maltehb/aelaectra-danish-electra-small-cased on a dataset consisting of approximately 5 million Facebook comments on DR's public Facebook pages. The labels have been automatically generated using weak supervision, based on the Snorkel framework.

The model almost achieves SOTA results while being 20x smaller, on a test set consisting of 600 Facebook comments annotated using majority vote by three annotators, of which 35.8% were labelled as offensive:

Model	Precision	Recall	F1-score	F2-score
`alexandrainst/da-offensive-detection-base`	74.81%	89.77%	81.61%	86.32%
`alexandrainst/da-offensive-detection-small` (this)	74.13%	89.30%	81.01%	85.79%
`A&ttack`	97.32%	50.70%	66.67%	56.07%
`alexandrainst/da-hatespeech-detection-small`	86.43%	56.28%	68.17%	60.50%
`Guscode/DKbert-hatespeech-detection`	75.41%	42.79%	54.60%	46.84%

Using the model

You can use the model simply by running the following:

>>> from transformers import pipeline
>>> offensive_text_pipeline = pipeline(model="alexandrainst/da-offensive-detection-small")
>>> offensive_text_pipeline("Din store idiot")
[{'label': 'Offensive', 'score': 0.9997463822364807}]

Processing multiple documents at the same time can be done as follows:

>>> offensive_text_pipeline(["Din store idiot", "ej hvor godt :)"])
[{'label': 'Offensive', 'score': 0.9997463822364807}, {'label': 'Not offensive', 'score': 0.9996451139450073}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
gradient_accumulation_steps: 1
total_train_batch_size: 32
seed: 4242
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
max_steps: 500000
fp16: True
eval_steps: 1000
early_stopping_patience: 100

Framework versions

Transformers 4.20.1
Pytorch 1.11.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1

alexandrainst
/

da-offensive-detection-small

Danish Offensive Text Detection based on ELECTRA-small

Using the model

Training procedure

Training hyperparameters

Framework versions

Model tree for alexandrainst/da-offensive-detection-small

Space using alexandrainst/da-offensive-detection-small 1