metadata

language:
  - da
  - 'no'
library_name: transformers
f1-score: 0.83

Model Card for A&ttack2

A text classification model for determining if a social media post in Danish or Norwegian contains a verbal attack.

Model Description

The model is based on the North-T5-NCC Large (developed by Per E. Kummervold) which is a Scandinavian language built upon T5 and T5X. The model is further trained on ~70k Norwegian and ~67k Danish social media posts which have been classified as either 'verbal attack' or 'nothing', making it a text-to-text model restricted to do classification. The model is described in Danish in this report.

Developed by: The development team at Analyse & Tal
Model type: Language model restricted to classification
Language(s) (NLP): Danish and Norwegian
License: [More Information Needed]
Finetuned from model: North-T5-NCC Large

Direct Use

This model can be used for classifying Danish and Norwegian social media posts or similar text.

Bias, Risks, and Limitations

[More Information Needed]

Training Data

A collection of ~70k Norwegian and ~67k Danish social media posts have been manually annotated as 'verbal attack' or 'nothing' by annotators. 5% of the posts have been annotated by more then one annotator, with the annotators in agreement for 83% of annotations.

[More information needed on the data split method and the training-validation-test split.]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

Macro-averaged f1-score: 0.83

[More Information Needed]

Results

[More Information Needed]

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: Azure
Compute Region: North-Europe
Carbon Emitted: [More Information Needed]

Model Card Authors

This model card was written by the developer team at Analyse & Tal. Contact: oyvind@ogtal.dk.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Download/load tokenizer and language model
tokenizer = AutoTokenizer.from_pretrained("ogtal/A-og-ttack2")
model = AutoModelForSeq2SeqLM.from_pretrained("ogtal/A-og-ttack2")

# Give sample text. The example is from a social media comment.
sample_text = "Velbekomme dit klamme usle løgnersvin!"
input_ids = tokenizer(sample_text, return_tensors="pt").input_ids

# Forward pass and print the output
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Running the above code will print "angreb" (attack in Danish)