tarudesu
/

ViHateT5-base-HSD

Text2Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ViHateT5-base-HSD / README.md

tarudesu's picture

Update README.md

1fef985 verified 2 months ago

|

history blame contribute delete

No virus

2.79 kB

	---
	base_model: tarudesu/ViHateT5-base
	tags:
	- generated_from_trainer
	model-index:
	- name: ViHateT5-base-HSD
	results: []
	datasets:
	- tarudesu/ViCTSD
	- tarudesu/ViHOS
	- tarudesu/ViHSD
	language:
	- vi
	metrics:
	- f1
	- accuracy
	pipeline_tag: text2text-generation
	widget:
	- text: "toxic-speech-detection: Nhìn bà không thể không nhớ đến các phim phù thủy"
	- text: "hate-speech-detection: thằng đó trông đần vcl ấy nhỉ"
	- text: "hate-spans-detection: trông như cl"
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# <a name="introduction"></a>ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model \| ACL'2024 (Findings)
	Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.

	ViHateT5-HSD is the fine-tuned model of [ViHateT5](https://huggingface.co/tarudesu/ViHateT5-base) on multiple Vietnamese hate speech detection benchmark datasets.

	The architecture and experimental results of ViHateT5 can be found in the [paper](LINK):

	@misc{nguyen2024vihatet5,
	title={ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model},
	author={Luan Thanh Nguyen},
	year={2024},
	eprint={2405.14141},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}

	The pre-training dataset named VOZ-HSD is available at [HERE](https://huggingface.co/datasets/tarudesu/VOZ-HSD).

	Kindly CITE our paper if you use ViHateT5-HSD to generate published results or integrate it into other software.

	Example usage
	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base-HSD")
	model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base-HSD")

	def generate_output(input_text, prefix):
	# Add prefix
	prefixed_input_text = prefix + ': ' + input_text

	# Tokenize input text
	input_ids = tokenizer.encode(prefixed_input_text, return_tensors="pt")

	# Generate output
	output_ids = model.generate(input_ids, max_length=256)

	# Decode the generated output
	output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

	return output_text

	sample = 'Tôi ghét bạn vl luôn!'
	prefix = 'hate-spans-detection' # Choose 1 from 3 prefixes ['hate-speech-detection', 'toxic-speech-detection', 'hate-spans-detection']

	result = generate_output(sample, prefix)
	print('Result: ', result)
	```

	Please feel free to contact us by email luannt@uit.edu.vn if you have any further information!