language:
- da
- 'no'
library_name: transformers
f1-score: 0.76
Model Card for A&ttack2
Text classification model that determines whether a not a short text contains an attack.
Model Description
The model is based on the North-T5-NCC Large (developed by Per E. Kummervold) which is a Scandinavian language built upon T5 and T5X. The model is further trained on ~70k Norwegian and ~67k Danish social media posts which have been classified as either 'attack' or 'not attack', making it a text-to-text model manipulated to do classification. The model is described in Danish in this report.
- Developed by: The development team at Analyse & Tal
- Model type: Language model restricted to classification
- Language(s) (NLP): Danish and Norwegian
- License: [More Information Needed]
- Finetuned from model: [More information needed]
Direct Use
The model can be used directly to classify Danish and Norwegian social media posts (or similar pieces of text).
Bias, Risks, and Limitations
[More Information Needed]
Training Data
A collection of ~70k Norwegian and ~67k Danish social media posts have been manually annotated as 'attack' or 'not attack' by six individual coders. 5% of the posts have been annotated by more then one annotator, with the annotators in agreement for 83% of annotations.
Hvad er data-split metoden? Hvad er training-validation-test split?
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
Macro-averaged f1-score: 0.76
[More Information Needed]
Results
[More Information Needed]
Summary
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: Azure
- Compute Region: North-Europe
- Carbon Emitted: [More Information Needed]
Model Card Authors
This model card was written by the developer team at Analyse & Tal. Contact: oyvind@ogtal.dk.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Download/load tokenizer and language model
tokenizer = AutoTokenizer.from_pretrained("ogtal/A-og-ttack2")
model = AutoModelForSeq2SeqLM.from_pretrained("ogtal/A-og-ttack2")
# Give sample text. The example is from a social media comment.
sample_text = "Velbekomme dit klamme usle løgnersvin!"
input_ids = tokenizer("Velbekomme", return_tensors="pt").input_ids
# Forward pass and print the output
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Running the above code will print "angreb" (attack in Danish)