This repo contains models for generating hate speech and NLI adversarial examples. The base architecture is the GPT-2 causal language model. Hate speech models are trained on the DynaHate dataset, while NLI models are trained on AdversarialNLI. Further details can be found in this paper.
Models are intended for testing/improving robustness of neural classifiers only.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.