ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)
Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.
ViHateT5 is the state-of-the-art pre-trained text-to-text transformer model for Vietnamese (HSD tasks). Note the this checkpoint need to be fine-tuned on downstream tasks, especially hate speech detection ones (ViHateT5-HSD is the fine-tuned model mentioned in the paper).
The architecture and experimental results of ViHateT5 can be found in the paper:
@inproceedings{thanh-nguyen-2024-vihatet5,
title = "{V}i{H}ate{T}5: Enhancing Hate Speech Detection in {V}ietnamese With a Unified Text-to-Text Transformer Model",
author = "Thanh Nguyen, Luan",
editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.355",
pages = "5948--5961"
}
The pre-training dataset named VOZ-HSD is available at HERE.
Kindly CITE our paper if you use ViHateT5 to generate published results or integrate it into other software.
Example usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base")
Please feel free to contact us by email luannt@uit.edu.vn if you have any further information!
- Downloads last month
- 4