youngggggg
/

ToxiGen-ConPrompt

Feature Extraction

Transformers

PyTorch

English

bert

Model card Files Files and versions Community

youngggggg commited on Dec 18, 2023

Commit

8a711aa

1 Parent(s): 888a879

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -25

README.md CHANGED Viewed

@@ -8,6 +8,9 @@ language:
 # Model Card for ToxiGen-ConPrompt
 <!-- Provide a quick summary of what the model is/does. -->
 <!-- {{ model_summary | default("", true) }} -->
@@ -29,7 +32,6 @@ language:
 - **Pre-training Approach:** ConPrompt
 <!-- Provide the basic links for the model. -->
 - **ConPrompt Repository:** https://github.com/youngwook06/ConPrompt
 - **ConPrompt Paper:** https://aclanthology.org/2023.findings-emnlp.731/
@@ -53,27 +55,3 @@ While these behavior can lead to social good e.g., constructing training data fo
 **We strongly emphasize the need for careful handling to prevent unintentional misuse and warn against malicious exploitation of such behaviors.**
-## Citation
-**BibTeX:**
-@inproceedings{kim-etal-2023-conprompt,
-    title = "{C}on{P}rompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection",
-    author = "Kim, Youngwook  and
-      Park, Shinwoo  and
-      Namgoong, Youngsoo  and
-      Han, Yo-Sub",
-    editor = "Bouamor, Houda  and
-      Pino, Juan  and
-      Bali, Kalika",
-    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
-    month = dec,
-    year = "2023",
-    address = "Singapore",
-    publisher = "Association for Computational Linguistics",
-    url = "https://aclanthology.org/2023.findings-emnlp.731",
-    doi = "10.18653/v1/2023.findings-emnlp.731",
-    pages = "10964--10980",
-    abstract = "Implicit hate speech detection is a challenging task in text classification since no explicit cues (e.g., swear words) exist in the text. While some pre-trained language models have been developed for hate speech detection, they are not specialized in implicit hate speech. Recently, an implicit hate speech dataset with a massive number of samples has been proposed by controlling machine generation. We propose a pre-training approach, ConPrompt, to fully leverage such machine-generated data. Specifically, given a machine-generated statement, we use example statements of its origin prompt as positive samples for contrastive learning. Through pre-training with ConPrompt, we present ToxiGen-ConPrompt, a pre-trained language model for implicit hate speech detection. We conduct extensive experiments on several implicit hate speech datasets and show the superior generalization ability of ToxiGen-ConPrompt compared to other pre-trained models. Additionally, we empirically show that ConPrompt is effective in mitigating identity term bias, demonstrating that it not only makes a model more generalizable but also reduces unintended bias. We analyze the representation quality of ToxiGen-ConPrompt and show its ability to consider target group and toxicity, which are desirable features in terms of implicit hate speeches.",
-}

 # Model Card for ToxiGen-ConPrompt
+**ToxiGen-ConPrompt** is a pre-trained language model for implicit hate speech detection.
+The model is pre-trained on a machine-generated dataset for implicit hate speech detection (i.e., *ToxiGen*) using our proposing pre-training approach (i.e., *ConPrompt*).
 <!-- Provide a quick summary of what the model is/does. -->
 <!-- {{ model_summary | default("", true) }} -->
 - **Pre-training Approach:** ConPrompt
 <!-- Provide the basic links for the model. -->
 - **ConPrompt Repository:** https://github.com/youngwook06/ConPrompt
 - **ConPrompt Paper:** https://aclanthology.org/2023.findings-emnlp.731/
 **We strongly emphasize the need for careful handling to prevent unintentional misuse and warn against malicious exploitation of such behaviors.**