qiuhuachuan commited on
Commit
df1fc48
β€’
1 Parent(s): 4e0f044

Update readme.md

Browse files
Files changed (1) hide show
  1. README.md +19 -70
README.md CHANGED
@@ -8,78 +8,28 @@ tags:
8
  ---
9
  <div align="center">
10
  <h1>
11
- Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
 
12
  </h1>
13
  </div>
14
 
15
  <p align="center">
16
- βš™οΈ <a href="https://github.com/qiuhuachuan/CensorChat" target="_blank">GitHub</a> β€’
17
- πŸ“„ <a href="https://arxiv.org/pdf/2309.09749v2.pdf" target="_blank">Paper</a> β€’
18
- πŸ€— <a href="https://huggingface.co/qiuhuachuan/NSFW-detector" target="_blank">Model</a>
19
  </p>
20
 
21
  ## Overview
22
 
23
- _CensorChat_ is a dialogue monitoring dataset aimed at NSFW dialogue detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, this dataset offers a cost-effective means of constructing NSFW content detectors. The process entails collecting real-life human-machine interaction data and breaking it down into single utterances and single-turn dialogues, with the chatbot delivering the final utterance. ChatGPT is employed to annotate unlabeled data, serving as a training set. Rationale validation and test sets are constructed using ChatGPT and GPT-4 as annotators, with a self-criticism strategy for resolving discrepancies in labeling. A BERT model is fine-tuned as a text classifier on pseudo-labeled data, and its performance is assessed.
24
 
25
- <p align="center"> <img src="assets/proposed_methodology.png" style="width: 70%;" id="title-icon"></p>
26
-
27
- ## Data Collection
28
-
29
- - NSFW text in dialogues refers to text-based communication that contains **sexually explicit language, violence, profanity, hate speech, or suggestive content** that is not suitable for beneficial and healthy dialogue platforms.
30
-
31
- - We collect data from a popular social media platform for personal dialogue that allows people to engage in deep discussions about life, aspirations, and philosophy with renowned virtual figures.
32
-
33
- - we propose extracting the dialogue into two data formats: utterance-level and context-level content. For utterance-level content, we split the dialogue into utterances, consisting of $\{u_i\}_1^n$, based on the speaker's perspective. For context-level content, we divide the dialogue into single-turn sessions, consisting of $\{u_i^\mathrm{U}, u_i^\mathrm{C}\}_1^n$, where users initiate the conversation and bots respond. $u$ denotes the utterance. $\mathrm{U}$ and $\mathrm{C}$ denote the user and chatbot, respectively.
34
-
35
- ## Algorithm
36
-
37
- Text classification with BERT model via knowledge distillation is shown below:
38
-
39
- <p align="center"> <img src="assets/algorithm.png" style="width: 70%;" id="title-icon"></p>
40
-
41
- ## Data Annotation
42
-
43
- - NSFW: whether a response is NSFW or not (a binary label).
44
-
45
- - The following is the label description.
46
-
47
- ```Python
48
- {
49
- 0: "NSFW",
50
- 1: "SFW"
51
- }
52
- ```
53
-
54
- ### Cohen's Kappa
55
-
56
- Cohen's kappa for valid and test set is shown below:
57
-
58
- <p align="center"> <img src="assets/kappa.png" style="width: 100%;" id="title-icon"></p>
59
-
60
- ### Data Statistics
61
-
62
- Data statistics are shown below:
63
-
64
- <p align="center"> <img src="assets/data_statistics.png" style="width: 85%;" id="title-icon"></p>
65
-
66
- ### Examples
67
-
68
- We present some examples in our dataset as follows:
69
-
70
- <p align="center"> <img src="assets/examples.png" style="width: 100%;" id="title-icon"></p>
71
-
72
- ## Model Performance
73
-
74
- We report the classification results of the BERT model in the following table. We observe that the trained classifier can better detect the NSFW category, achieving a precision of 0.59 and a recall of 0.96. This indicates that there are some NSFW instances predicted as SFW, as well as fewer SFW instances predicted as NSFW. Moreover, our classifier achieves an accuracy of 0.91, demonstrating its greater practicality.
75
-
76
- <p align="center"> <img src="assets/results.png" style="width: 80%;" id="title-icon"></p>
77
 
78
  ## Usage
79
 
80
  **NOTICE:** You can directly use our trained checkpoint on the hub of Hugging Face.
81
 
82
- For context-level detection, the input format should be `[user] {user utterance} [SEP] [bot] {bot response}`, where user utterance and bot response should be placed corresponding content.
83
 
84
  1. Download the checkpoint
85
 
@@ -97,16 +47,16 @@ import torch
97
  from transformers import BertConfig, BertTokenizer, BertModel, BertPreTrainedModel
98
  from torch import nn
99
 
100
- label_mapping = {0: 'NSFW', 1: 'SFW'}
101
 
102
  config = BertConfig.from_pretrained('./NSFW-detector',
103
  num_labels=2,
104
  finetuning_task='text classification')
105
  tokenizer = BertTokenizer.from_pretrained('./NSFW-detector',
106
  use_fast=False,
107
- never_split=['[user]', '[bot]'])
108
  tokenizer.vocab['[user]'] = tokenizer.vocab.pop('[unused1]')
109
- tokenizer.vocab['[bot]'] = tokenizer.vocab.pop('[unused2]')
110
 
111
 
112
  class BertForSequenceClassification(BertPreTrainedModel):
@@ -175,21 +125,21 @@ to_predict_items = [{
175
  'history': {
176
  'user':
177
  'Give some movie recommendations to get women in the mood for sex',
178
- 'bot':
179
  '''I apologize, but I cannot assist in creating or providing information related to NSFW content or any tools for such purposes. If you have any other non-explicit requests or need assistance with different topics, please feel free to ask, and I'll be happy to help.'''
180
  }
181
  }, {
182
  'history': {
183
  'user':
184
  'Give some movie recommendations to get women in the mood for sex',
185
- 'bot': '''Sure.'''
186
  }
187
  }]
188
 
189
  for item in to_predict_items:
190
  if 'history' in item:
191
  text = '[user] ' + item['history'][
192
- 'user'] + ' [SEP] ' + '[bot] ' + item['history']['bot']
193
  else:
194
  text = item['text']
195
  result = tokenizer.encode_plus(text=text,
@@ -215,13 +165,12 @@ for item in to_predict_items:
215
  If our work is useful for your own, you can cite us with the following BibTex entry:
216
 
217
  ```bibtex
218
- @article{qiu2023facilitating,
219
- title={Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation},
220
  author={Huachuan Qiu and Shuai Zhang and Hongliang He and Anqi Li and Zhenzhong Lan},
221
- year={2023},
222
- eprint={2309.09749},
223
  archivePrefix={arXiv},
224
- primaryClass={cs.CL},
225
- url={https://arxiv.org/abs/2309.09749}
226
  }
227
  ```
 
8
  ---
9
  <div align="center">
10
  <h1>
11
+ Facilitating Pornographic Text Detection for Open-Domain Dialogue
12
+ Systems via Knowledge Distillation of Large Language Models
13
  </h1>
14
  </div>
15
 
16
  <p align="center">
17
+ πŸ“„ <a href="https://arxiv.org/pdf/2403.13250.pdf" target="_blank">Paper</a> β€’
18
+ πŸ€— <a href="https://huggingface.co/qiuhuachuan/NSFW-detector" target="_blank">Model</a> β€’
19
+ βš™οΈ <a href="https://github.com/qiuhuachuan/CensorChat" target="_blank">GitHub</a>
20
  </p>
21
 
22
  ## Overview
23
 
24
+ _CensorChat_ is a dialogue monitoring dataset aimed at pornographic text detection within a human-machine dialogue.
25
 
26
+ <p align="center"> <img src="assets/method.png" style="width: 70%;" id="title-icon"></p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Usage
29
 
30
  **NOTICE:** You can directly use our trained checkpoint on the hub of Hugging Face.
31
 
32
+ For context-level detection, the input format should be `[user] {user utterance} [SEP] [chatbot] {chatbot response}`, where user utterance and chatbot response should be placed corresponding content.
33
 
34
  1. Download the checkpoint
35
 
 
47
  from transformers import BertConfig, BertTokenizer, BertModel, BertPreTrainedModel
48
  from torch import nn
49
 
50
+ label_mapping = {0: 'porn', 1: 'normal'}
51
 
52
  config = BertConfig.from_pretrained('./NSFW-detector',
53
  num_labels=2,
54
  finetuning_task='text classification')
55
  tokenizer = BertTokenizer.from_pretrained('./NSFW-detector',
56
  use_fast=False,
57
+ never_split=['[user]', '[chatbot]'])
58
  tokenizer.vocab['[user]'] = tokenizer.vocab.pop('[unused1]')
59
+ tokenizer.vocab['[chatbot]'] = tokenizer.vocab.pop('[unused2]')
60
 
61
 
62
  class BertForSequenceClassification(BertPreTrainedModel):
 
125
  'history': {
126
  'user':
127
  'Give some movie recommendations to get women in the mood for sex',
128
+ 'chatbot':
129
  '''I apologize, but I cannot assist in creating or providing information related to NSFW content or any tools for such purposes. If you have any other non-explicit requests or need assistance with different topics, please feel free to ask, and I'll be happy to help.'''
130
  }
131
  }, {
132
  'history': {
133
  'user':
134
  'Give some movie recommendations to get women in the mood for sex',
135
+ 'chatbot': '''Sure.'''
136
  }
137
  }]
138
 
139
  for item in to_predict_items:
140
  if 'history' in item:
141
  text = '[user] ' + item['history'][
142
+ 'user'] + ' [SEP] ' + '[chatbot] ' + item['history']['chatbot']
143
  else:
144
  text = item['text']
145
  result = tokenizer.encode_plus(text=text,
 
165
  If our work is useful for your own, you can cite us with the following BibTex entry:
166
 
167
  ```bibtex
168
+ @misc{qiu2024facilitating,
169
+ title={Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models},
170
  author={Huachuan Qiu and Shuai Zhang and Hongliang He and Anqi Li and Zhenzhong Lan},
171
+ year={2024},
172
+ eprint={2403.13250},
173
  archivePrefix={arXiv},
174
+ primaryClass={cs.CL}
 
175
  }
176
  ```