Edit model card

ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)

Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.

ViHateT5 is the state-of-the-art pre-trained text-to-text transformer model for Vietnamese (HSD tasks). Note the this checkpoint need to be fine-tuned on downstream tasks, especially hate speech detection ones (ViHateT5-HSD is the fine-tuned model mentioned in the paper).

The architecture and experimental results of ViHateT5 can be found in the paper:

@misc{nguyen2024vihatet5,
      title={ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model}, 
      author={Luan Thanh Nguyen},
      year={2024},
      eprint={2405.14141},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

The pre-training dataset named VOZ-HSD is available at HERE.

Kindly CITE our paper if you use ViHateT5 to generate published results or integrate it into other software.

Example usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base")

Please feel free to contact us by email luannt@uit.edu.vn if you have any further information!

Downloads last month
31

Dataset used to train tarudesu/ViHateT5-base

Collection including tarudesu/ViHateT5-base