ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)

Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.

ViHateT5 is the state-of-the-art pre-trained text-to-text transformer model for Vietnamese (HSD tasks). Note the this checkpoint need to be fine-tuned on downstream tasks, especially hate speech detection ones (ViHateT5-HSD is the fine-tuned model mentioned in the paper).

The architecture and experimental results of ViHateT5 can be found in the paper:

@inproceedings{thanh-nguyen-2024-vihatet5,
    title = "{V}i{H}ate{T}5: Enhancing Hate Speech Detection in {V}ietnamese With a Unified Text-to-Text Transformer Model",
    author = "Thanh Nguyen, Luan",
    editor = "Ku, Lun-Wei  and Martins, Andre  and Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.355",
    pages = "5948--5961"
    }

The pre-training dataset named VOZ-HSD is available at HERE.

Kindly CITE our paper if you use ViHateT5 to generate published results or integrate it into other software.

Example usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base")

Please feel free to contact us by email luannt@uit.edu.vn if you have any further information!

tarudesu
/

ViHateT5-base

ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)

Model tree for tarudesu/ViHateT5-base

Dataset used to train tarudesu/ViHateT5-base

Collection including tarudesu/ViHateT5-base

ViHateT5 - Vietnamese Hate Speech Detection with T5