metadata

license: mit
model-index:
  - name: xlm-roberta-base-offensive-text-detection-da
    results: []
widget:
  - text: Din store idiot

Danish Offensive Text Detection based on ELECTRA-small

This model is a fine-tuned version of xlm-roberta-base on a dataset consisting of approximately 5 million Facebook comments on DR's public Facebook pages. The labels have been automatically generated using weak supervision, based on the Snorkel framework.

The model achieves second place on a test set consisting of 500 Facebook comments annotated by two people, of which 41.2% were labelled as offensive:

Model	Precision	Recall	F1-score
`alexandrainst/electra-small-offensive-text-detection-da`	85.45%	91.26%	88.26%
`alexandrainst/xlm-roberta-base-offensive-text-detection-da` (this)	83.48%	93.20%	88.07%
`A-ttack`	99.17%	58.25%	73.39%
`DaNLP/da-electra-hatespeech-detection`	92.19%	57.28%	70.66%
`Guscode/DKbert-hatespeech-detection`	84.91%	43.69%	57.69%

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
gradient_accumulation_steps: 1
total_train_batch_size: 32
seed: 4242
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
max_steps: 500000
fp16: True
eval_steps: 1000
early_stopping_patience: 100

Framework versions

Transformers 4.20.1
Pytorch 1.11.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1