--- language: - 'da' - 'no' library_name: transformers f1-score: 0.76 --- # Model Card for A&ttack2 Text classification model that determines whether a not a short text contains an attack. # Model Description The model is based on the [North-T5-NCC Large](https://huggingface.co/north/t5_large_NCC) (developed by Per E. Kummervold) which is a Scandinavian language built upon [T5](https://github.com/google-research/text-to-text-transfer-transformer) and [T5X](https://github.com/google-research/t5x). The model is further trained on ~70k Norwegian and ~67k Danish social media posts which have been classified as either 'attack' or 'not attack', making it a text-to-text model manipulated to do classification. The model is described in Danish in [this report](https://strapi.ogtal.dk/uploads/966f1ebcfa9942d3aef338e9920611f4.pdf). - **Developed by:** The development team at Analyse & Tal - **Model type:** Language model restricted to classification - **Language(s) (NLP):** Danish and Norwegian - **License:** [More Information Needed] - **Finetuned from model:** [More information needed] # Direct Use The model can be used directly to classify Danish and Norwegian social media posts (or similar pieces of text). # Bias, Risks, and Limitations [More Information Needed] # Training Data A collection of ~70k Norwegian and ~67k Danish social media posts have been manually annotated as 'attack' or 'not attack' by six individual coders. 5% of the posts have been annotated by more then one annotator, with the annotators in agreement for 83% of annotations. *Hvad er data-split metoden? Hvad er training-validation-test split?* # Evaluation ## Testing Data, Factors & Metrics ### Testing Data [More Information Needed] ### Factors [More Information Needed] ### Metrics Macro-averaged f1-score: 0.76 [More Information Needed] ## Results [More Information Needed] ### Summary # Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** Azure - **Compute Region:** North-Europe - **Carbon Emitted:** [More Information Needed] # Model Card Authors This model card was written by the developer team at Analyse & Tal. Contact: oyvind@ogtal.dk. # How to Get Started with the Model Use the code below to get started with the model. ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Download/load tokenizer and language model tokenizer = AutoTokenizer.from_pretrained("ogtal/A-og-ttack2") model = AutoModelForSeq2SeqLM.from_pretrained("ogtal/A-og-ttack2") # Give sample text. The example is from a social media comment. sample_text = "Velbekomme dit klamme usle løgnersvin!" input_ids = tokenizer("Velbekomme", return_tensors="pt").input_ids # Forward pass and print the output outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` Running the above code will print "angreb" (attack in Danish)