suzzzylin's picture
Update README.md
975fac0 verified
metadata
license: apache-2.0
metrics:
  - precision
  - recall
  - f1
model-index:
  - name: ToxicChat-T5-Large
    results:
      - task:
          type: text-classification
        dataset:
          name: ToxicChat
          type: toxicchat0124
        metrics:
          - name: precision
            type: precision
            value: 0.7983
            verified: false
          - name: recall
            type: recall
            value: 0.8475
            verified: false
          - name: f1
            type: f1
            value: 0.8221
            verified: false
          - name: auprc
            type: auprc
            value: 0.885
            verified: false

ToxicChat-T5-Large Model Card

Model Details

Model type: ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on ToxicChat. It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not ('positive' means 'toxic', and 'negative' means 'non-toxic').

Model date: ToxicChat-T5-Large was trained on Jan 2024.

Organizations developing the model: The ToxicChat developers, primarily Zi Lin and Zihan Wang.

Paper or resources for more information: https://arxiv.org/abs/2310.17389

License: Apache License 2.0

Where to send questions or comments about the model: https://huggingface.co/datasets/lmsys/toxic-chat/discussions

Use

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

checkpoint = "lmsys/toxicchat-t5-large-v1.0"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained("t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)

prefix = "ToxicChat: "
inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic').

Evaluation

We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set:

Model Precision Recall F1 AUPRC
ToxicChat-T5-large 0.7983 0.8475 0.8221 0.8850
OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) 0.5476 0.6989 0.6141 0.6313

Citation

@misc{lin2023toxicchat,
      title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, 
      author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
      year={2023},
      eprint={2310.17389},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}