README.md · slvnwhrl/gbert-face-mask-sentiment at main

metadata

license: mit
language:
  - de
pipeline_tag: text-classification
tags:
  - public-health
  - twitter
  - sentiment-analysis

TL;DR

This model can be used for sentiment analysis of German tweets discussing the use of masks (in the context of the COVID-19 pandemic).

Check out the paper for details: Guiding Sentiment Analysis with Hierarchical Text Clustering: Analyzing the German X/Twitter Discourse on Face Masks in the 2020 COVID-19 Pandemic
And have a look at our GitHub repo to see how we used this model in combination with hierarchical text clustering! :)

Training

The classifier is based on GBERT-base and was trained in a two-stage setup. First, it was continuingly pretrained on roughly 340k German tweeets discussing mask. Secondly, it was fine-tuned using an annotated dataset of roughly 2k examples.
The model is trained to predict tweets into neutral, negative, or positive.
Tweets were only preprocessed by replacing urls with 'https' and user mentions with '@user'.

Performance

The model achieves a weighted F1-score of 82.36%.

Inferenence

If you would like to use the model, you can load it with the Transformers librabry:

from transformers import pipeline

model_path = "slvnwhrl/gbert-mask-sentiment"
gbert_mask = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)

gbert_mask("insert some text in German") # ready to roll

Citation

If you use this model in your research, please cite the paper using:

@inproceedings{wehrli-etal-2024-guiding,
    title = "Guiding Sentiment Analysis with Hierarchical Text Clustering: Analyzing the {G}erman {X}/{T}witter Discourse on Face Masks in the 2020 {COVID}-19 Pandemic",
    author = "Wehrli, Silvan  and
      Ezekannagha, Chisom  and
      Hattab, Georges  and
      Boender, Tamara  and
      Arnrich, Bert  and
      Irrgang, Christopher",
    editor = "De Clercq, Orph{\'e}e  and
      Barriere, Valentin  and
      Barnes, Jeremy  and
      Klinger, Roman  and
      Sedoc, Jo{\~a}o  and
      Tafreshi, Shabnam",
    booktitle = "Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, {\&} Social Media Analysis",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.wassa-1.13",
    pages = "153--167",
    abstract = "Social media are a critical component of the information ecosystem during public health crises. Understanding the public discourse is essential for effective communication and misinformation mitigation. Computational methods can aid these efforts through online social listening. We combined hierarchical text clustering and sentiment analysis to examine the face mask-wearing discourse in Germany during the COVID-19 pandemic using a dataset of 353,420 German X (formerly Twitter) posts from 2020. For sentiment analysis, we annotated a subsample of the data to train a neural network for classifying the sentiments of posts (neutral, negative, or positive). In combination with clustering, this approach uncovered sentiment patterns of different topics and their subtopics, reflecting the online public response to mask mandates in Germany. We show that our approach can be used to examine long-term narratives and sentiment dynamics and to identify specific topics that explain peaks of interest in the social media discourse.",
}