--- license: apache-2.0 ---
Safety classifier for Detoxifying Large Language Models via Knowledge Editing
# 💻 Usage ```shell from transformers import RobertaForSequenceClassification, RobertaTokenizer safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier' safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir) safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir) ``` You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path. # 📖 Citation If you use our work, please cite our paper: ```bibtex @misc{wang2024SafeEdit, title={Detoxifying Large Language Models via Knowledge Editing}, author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen}, year={2024}, eprint={2403.14472}, archivePrefix={arXiv}, primaryClass={cs.CL} url={https://arxiv.org/abs/2403.14472}, } ```