HaT5(T5-base)

This is a fine-tuned model of T5 (base) on the hate speech detection dataset. It is intended to be used as a classification model for identifying Tweets (0 - HOF(hate/offensive); 1 - NOT). The task prefix we used for the T5 model is 'classification: '.

More information about the original pre-trained model can be found here

Classification examples:

Prediction	Tweet
0	Why the fuck I got over 1000 views on my story 😂😂 nothing new over here
1.	first of all there is no vaccine to cure , whthr it is capsules, tablets, or injections, they just support to fight with d virus. I do not support people taking any kind of home remedies n making fun of an ayurvedic medicine..😐

More Details

For more details about the datasets and eval results, see our paper for this work here The paper was accepted at the International Joint Conference on Neural Networks (IJCNN) conference 2022.

How to use


from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch
model = T5ForConditionalGeneration.from_pretrained("sana-ngu/HaT5")
tokenizer = T5Tokenizer.from_pretrained("t5-base") 
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer("Old lions in the wild lay down and die with dignity when they can't hunt anymore. If a government is having 'teething problems' handling aid supplies one full year into a pandemic, maybe it should take a cue and get the fuck out of the way? ", padding=True, truncation=True, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(pred)