Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This repo contains models for generating hate speech and NLI adversarial examples. The base architecture is the GPT-2 causal language model. Hate speech models are trained on the DynaHate dataset, while NLI models are trained on AdversarialNLI. Further details can be found in this paper.

Models are intended for testing/improving robustness of neural classifiers only.

Downloads last month
0
Unable to determine this model's library. Check the docs .