|
--- |
|
license: apache-2.0 |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
model-index: |
|
- name: ToxicChat-T5-Large |
|
results: |
|
- task: |
|
type: text-classification |
|
dataset: |
|
name: ToxicChat |
|
type: toxicchat0124 |
|
metrics: |
|
- name: precision |
|
type: precision |
|
value: 0.7983 |
|
verified: false |
|
- name: recall |
|
type: recall |
|
value: 0.8475 |
|
verified: false |
|
- name: f1 |
|
type: f1 |
|
value: 0.8221 |
|
verified: false |
|
- name: auprc |
|
type: auprc |
|
value: 0.8850 |
|
verified: false |
|
--- |
|
# ToxicChat-T5-Large Model Card |
|
|
|
## Model Details |
|
|
|
**Model type:** |
|
ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat). |
|
It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not |
|
('positive' means 'toxic', and 'negative' means 'non-toxic'). |
|
|
|
**Model date:** |
|
ToxicChat-T5-Large was trained on Jan 2024. |
|
|
|
**Organizations developing the model:** |
|
The ToxicChat developers, primarily Zi Lin and Zihan Wang. |
|
|
|
**Paper or resources for more information:** |
|
https://arxiv.org/abs/2310.17389 |
|
|
|
**License:** |
|
Apache License 2.0 |
|
|
|
**Where to send questions or comments about the model:** |
|
https://huggingface.co/datasets/lmsys/toxic-chat/discussions |
|
|
|
## Use |
|
```python |
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
|
|
checkpoint = "lmsys/toxicchat-t5-large-v1.0" |
|
device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("t5-large") |
|
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device) |
|
|
|
prefix = "ToxicChat: " |
|
inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device) |
|
outputs = model.generate(inputs) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic'). |
|
|
|
## Evaluation |
|
We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set: |
|
|
|
| Model | Precision | Recall | F1 | AUPRC | |
|
| --- | --- | --- | --- | --- | |
|
| ToxicChat-T5-large | 0.7983 | 0.8475 | 0.8221 | 0.8850 | |
|
| OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) | 0.5476 | 0.6989 | 0.6141 | 0.6313 | |
|
|
|
## Citation |
|
``` |
|
@misc{lin2023toxicchat, |
|
title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, |
|
author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, |
|
year={2023}, |
|
eprint={2310.17389}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |