lmsys
/

toxicchat-t5-large-v1.0

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

toxicchat-t5-large-v1.0 / README.md

suzzzylin's picture

Update README.md

975fac0 verified 12 months ago

|

history blame contribute delete

2.69 kB

	---
	license: apache-2.0
	metrics:
	- precision
	- recall
	- f1
	model-index:
	- name: ToxicChat-T5-Large
	results:
	- task:
	type: text-classification
	dataset:
	name: ToxicChat
	type: toxicchat0124
	metrics:
	- name: precision
	type: precision
	value: 0.7983
	verified: false
	- name: recall
	type: recall
	value: 0.8475
	verified: false
	- name: f1
	type: f1
	value: 0.8221
	verified: false
	- name: auprc
	type: auprc
	value: 0.8850
	verified: false
	---
	# ToxicChat-T5-Large Model Card

	## Model Details

	Model type:
	ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat).
	It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not
	('positive' means 'toxic', and 'negative' means 'non-toxic').

	Model date:
	ToxicChat-T5-Large was trained on Jan 2024.

	Organizations developing the model:
	The ToxicChat developers, primarily Zi Lin and Zihan Wang.

	Paper or resources for more information:
	https://arxiv.org/abs/2310.17389

	License:
	Apache License 2.0

	Where to send questions or comments about the model:
	https://huggingface.co/datasets/lmsys/toxic-chat/discussions

	## Use
	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	checkpoint = "lmsys/toxicchat-t5-large-v1.0"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained("t5-large")
	model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)

	prefix = "ToxicChat: "
	inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic').

	## Evaluation
	We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set:

	\| Model \| Precision \| Recall \| F1 \| AUPRC \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| ToxicChat-T5-large \| 0.7983 \| 0.8475 \| 0.8221 \| 0.8850 \|
	\| OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) \| 0.5476 \| 0.6989 \| 0.6141 \| 0.6313 \|

	## Citation
	```
	@misc{lin2023toxicchat,
	title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
	author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
	year={2023},
	eprint={2310.17389},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```