Edit model card

BERTweet for sexism detection

This is a fine-tuned BERTweet large (BERTweet: A pre-trained language model for English Tweets) model for detecting sexism. The training dataset is new balanced version of Explainable Detection of Online Sexism (EDOS)--sexism-socialmedia-balanced--consisting of 16000 entries in English gathered from social media platforms: Twitter and Gab. It achieved a Macro-F1 score of 0.85 and an Accuracy of 0.88 on the test set for the EDOS task.

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bertweet-sexism')
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bertweet-sexism')

# Create the pipeline for classification
sexism_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Predict
sexism_classifier("Girls like attention and they get desperate")

Citation

@inproceedings{rydelek-etal-2023-adamr,
    title = "{A}dam{R} at {S}em{E}val-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning",
    author = "Rydelek, Adam  and
      Dementieva, Daryna  and
      Groh, Georg",
    editor = {Ojha, Atul Kr.  and
      Do{\u{g}}ru{\"o}z, A. Seza  and
      Da San Martino, Giovanni  and
      Tayyar Madabushi, Harish  and
      Kumar, Ritesh  and
      Sartori, Elisa},
    booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.semeval-1.190",
    doi = "10.18653/v1/2023.semeval-1.190",
    pages = "1371--1381",
    abstract = "The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on different datasets, which are tested to find the balance between performance and interpretability. This solution ranked us in the top 40{\%} of teams for each of the tracks.",
}

Licensing Information

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

Downloads last month
40

Dataset used to train tum-nlp/bertweet-sexism