Multilingual Hate Speech Classifier

Model Description

This model is a multilingual hate speech classifier based on the XLM-R architecture. It is trained to detect hate speech in English, Italian, and Slovene. The model leverages multilingual datasets and incorporates techniques to learn from disagreement among annotators, making it robust in understanding and identifying nuanced hate speech across different languages. It has been developed as part of my Master's thesis and the training methodology follows the approach outlined by Kralj Novak et al. (2022) in their paper "Handling Disagreement in Hate Speech Modelling".

Model Details

Model Name: Multilingual Hate Speech Classifier
Model Architecture: XLM-R (XLM-RoBERTa)
Languages Supported: English (EN), Italian (IT), Slovene (SL)

Training Data

The model is trained on a multilingual dataset consisting of Twitter and YouTube comments in EN, IT and SL. The dataset consists of diamond standard data, i.e. an alternative to the gold standard that takes into account the perspectives of multiple annotators. This is particularly useful for highly subjective tasks such as annotating hate speech, where the idea of a single truth may be debatable.

Techniques Used

Multilingual Training: The model is trained on datasets in multiple languages, allowing it to generalize well across different linguistic contexts.
Learning from Disagreement: The model incorporates techniques to learn from annotator disagreement, improving its ability to handle ambiguous and nuanced cases of hate speech.

Hate Speech Classes

Acceptable: does not present inappropriate, offensive or violent elements.
Inappropriate: contains terms that are obscene or vulgar; but the text is not directed at any specific target.
Offensive: includes offensive generalizations, contempt, dehumanization, or indirect offensive remarks.
Violent: threatens, indulges, desires or calls for physical violence against a target; it also includes calling for, denying or glorifying war crimes and crimes against humanity.

Evaluation Metrics

The model's performance is evaluated using the following metrics:

Krippendorff's Ordinal Alpha
Accuracy
Precision
Recall
F1 Score

These metrics are computed for each language separately, as well as across the entire multilingual dataset. Krippendorff's Alpha was used to measure both the disagreement between the annotators themselves and between the annotators and the model.

Primary Use Case

The primary use case for this model is to automatically detect and moderate hate speech on social media platforms, online forums, and other digital content platforms. This can help in reducing the spread of harmful content and maintaining a safe online environment.