|
--- |
|
license: mit |
|
datasets: |
|
- skg/toxigen-data |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for ToxiGen-ConPrompt |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
<!-- {{ model_summary | default("", true) }} --> |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
{{ model_description | default("", true) }} |
|
|
|
<!--- **Developed by:** {{ developers | default("[More Information Needed]", true)}} --> |
|
<!--- **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}} --> |
|
<!--- **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}} --> |
|
- **Model type:** Feature Extraction |
|
- **Base Model:** BERT-base-uncased |
|
- **Pre-training Source:** ToxiGen |
|
- **Pre-training Approach:** ConPrompt |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **ConPrompt Repository:** https://github.com/youngwook06/ConPrompt |
|
- **ConPrompt Paper:** https://aclanthology.org/2023.findings-emnlp.731/ |
|
|
|
|
|
## Ethical Considerations |
|
### Privacy Issue |
|
Before pre-training, we found out that some private information such as URLs exists in the machine-generated statements in ToxiGen. |
|
We anonymize such private information before pre-training to prevent any harm to our society. |
|
You can refer to the anonymization code we used in preprocess_toxigen.ipynb and we strongly emphasize to anonymize private information before using machine-generated data for pre-training. |
|
|
|
### Potential Misuse |
|
The pre-training source of ToxiGen-ConPrompt includes toxic statements. |
|
While we use such toxic statements on purpose to pre-train a better model for implicit hate speech detection, the pre-trained model needs careful handling. |
|
Here, we states some behavior that can lead to potential misuse so that our model is used for the social good rather than misued unintentionally or maliciously. |
|
|
|
- As our model was trained with the MLM objective, our model might generate toxic statements with its MLM head |
|
- As our model learned representations regarding implicit hate speeches, our model might retrieve some similar toxic statements given a toxic statement. |
|
|
|
While these behavior can lead to social good e.g., constructing training data for hate speech classifiers, one can potentially misuse the behaviors. |
|
|
|
**We strongly emphasize the need for careful handling to prevent unintentional misuse and warn against malicious exploitation of such behaviors.** |
|
|
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
@inproceedings{kim-etal-2023-conprompt, |
|
title = "{C}on{P}rompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection", |
|
author = "Kim, Youngwook and |
|
Park, Shinwoo and |
|
Namgoong, Youngsoo and |
|
Han, Yo-Sub", |
|
editor = "Bouamor, Houda and |
|
Pino, Juan and |
|
Bali, Kalika", |
|
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023", |
|
month = dec, |
|
year = "2023", |
|
address = "Singapore", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2023.findings-emnlp.731", |
|
doi = "10.18653/v1/2023.findings-emnlp.731", |
|
pages = "10964--10980", |
|
abstract = "Implicit hate speech detection is a challenging task in text classification since no explicit cues (e.g., swear words) exist in the text. While some pre-trained language models have been developed for hate speech detection, they are not specialized in implicit hate speech. Recently, an implicit hate speech dataset with a massive number of samples has been proposed by controlling machine generation. We propose a pre-training approach, ConPrompt, to fully leverage such machine-generated data. Specifically, given a machine-generated statement, we use example statements of its origin prompt as positive samples for contrastive learning. Through pre-training with ConPrompt, we present ToxiGen-ConPrompt, a pre-trained language model for implicit hate speech detection. We conduct extensive experiments on several implicit hate speech datasets and show the superior generalization ability of ToxiGen-ConPrompt compared to other pre-trained models. Additionally, we empirically show that ConPrompt is effective in mitigating identity term bias, demonstrating that it not only makes a model more generalizable but also reduces unintended bias. We analyze the representation quality of ToxiGen-ConPrompt and show its ability to consider target group and toxicity, which are desirable features in terms of implicit hate speeches.", |
|
} |
|
|
|
|