BERT for hate speech classification

The model is based on BERT and used for classifying a text as toxic and non-toxic. It achieved an F1 score of 0.81 and an Accuracy of 0.77.

The model was fine-tuned on the HateXplain dataset found here: https://huggingface.co/datasets/hatexplain

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bert-hateXplain')
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bert-hateXplain')

# Create the pipeline for classification
hate_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Predict
hate_classifier("I like you. I love you")

Downloads last month: 862

Dataset used to train tum-nlp/bert-hateXplain

Collection including tum-nlp/bert-hateXplain

Sexism and Hate Speech Mitigation

Collection

5 items • Updated Nov 28, 2024