Edit model card

ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)

Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.

ViHateT5-HSD is the fine-tuned model of ViHateT5 on multiple Vietnamese hate speech detection benchmark datasets.

The architecture and experimental results of ViHateT5 can be found in the paper:

@misc{nguyen2024vihatet5,
      title={ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model}, 
      author={Luan Thanh Nguyen},
      year={2024},
      eprint={2405.14141},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

The pre-training dataset named VOZ-HSD is available at HERE.

Kindly CITE our paper if you use ViHateT5-HSD to generate published results or integrate it into other software.

Example usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base-HSD")
model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base-HSD")

def generate_output(input_text, prefix):
    # Add prefix
    prefixed_input_text = prefix + ': ' + input_text

    # Tokenize input text
    input_ids = tokenizer.encode(prefixed_input_text, return_tensors="pt")

    # Generate output
    output_ids = model.generate(input_ids, max_length=256)

    # Decode the generated output
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    return output_text

sample = 'Tôi ghét bạn vl luôn!'
prefix = 'hate-spans-detection' # Choose 1 from 3 prefixes ['hate-speech-detection', 'toxic-speech-detection', 'hate-spans-detection']

result = generate_output(sample, prefix)
print('Result: ', result)

Please feel free to contact us by email luannt@uit.edu.vn if you have any further information!

Downloads last month
64
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train tarudesu/ViHateT5-base-HSD

Collection including tarudesu/ViHateT5-base-HSD