BERT for hate speech classification

The model is based on BERT and used for classifying a text as toxic and non-toxic. It achieved an F1 score of 0.81 and an Accuracy of 0.77.

The model was fine-tuned on the HateXplain dataset found here: https://huggingface.co/datasets/hatexplain

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bert-hateXplain')
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bert-hateXplain')

# Create the pipeline for classification
hate_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Predict
hate_classifier("I like you. I love you")
Downloads last month
19
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train tum-nlp/bert-hateXplain

Collection including tum-nlp/bert-hateXplain