bethard's picture
minor edits to widget example titles
f7929a4
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - incivility
metrics:
  - f1
widget:
  - text: Be careful around those DemocRats.
    example_title: Namecall
  - text: Be careful around those Democrats.
    example_title: No Namecall

Model Card for roberta-base-namecalling

This is a roBERTa-base model fine-tuned on ~12K social media posts annotated for the presence or absence of namecalling.

How to Get Started with the Model

You can use this model directly with a pipeline for text classification:

>>> import transformers
>>> model_name = "civility-lab/roberta-base-namecalling"
>>> classifier = transformers.TextClassificationPipeline(
...     tokenizer=transformers.AutoTokenizer.from_pretrained(model_name),
...     model=transformers.AutoModelForSequenceClassification.from_pretrained(model_name))
>>> classifier("Be careful around those Democrats.")
[{'label': 'not-namecalling', 'score': 0.9995089769363403}]
>>> classifier("Be careful around those DemocRats.")
[{'label': 'namecalling', 'score': 0.996940016746521}]

Model Details

This is a 2023 update of the model built by Ozler et al. (2020) incorporating data from Rains et al. (2021) and using a more recent version of the transformers library.

Uses

The model is intended to be used for text classification, taking as input social media posts and predicting as output whether the post contains namecalling.

It is not intended to generate namecalling, and it should not be used as part of any incivility generation model.

Training Details

The model was trained on data from four sources: comments on the Arizona Daily Star website from 2011, Russian troll Tweets from 2012-2018, Tucson politician Tweets from 2018, and US presidential primary Tweets from 2019. Each dataset was annotated for the presence of namecalling following the approach of Coe et al. (2014) and split into training, development, and test partitions.

The roberta-base model was fine-tuned on the combined training partitions from all four datasets, with texts tokenized using the standard roberta-base tokenizer.

Evaluation

The model was evaluated on the test partition of each of the datasets. It achieves the following F1 scores:

  • 0.58 F1 on Arizona Daily Star comments
  • 0.71 F1 on Russian troll Tweets
  • 0.71 F1 on Tucson politician Tweets
  • 0.81 F1 on US presidential primary Tweets

Limitations and Biases

The human coders and their trainers were mostly Western, educated, industrialized, rich and democratic (WEIRD), which may have shaped how they evaluated incivility. The trained models will reflect such biases.

Environmental Impact

Citation

@inproceedings{ozler-etal-2020-fine,
    title = "Fine-tuning for multi-domain and multi-label uncivil language detection",
    author = "Ozler, Kadir Bulut  and
      Kenski, Kate  and
      Rains, Steve  and
      Shmargad, Yotam  and
      Coe, Kevin  and
      Bethard, Steven",
    booktitle = "Proceedings of the Fourth Workshop on Online Abuse and Harms",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.alw-1.4",
    doi = "10.18653/v1/2020.alw-1.4",
    pages = "28--33",
}