metadata

language:
  - en
license:
  - mit
library_name: pytorch
multilinguality:
  - monolingual
pretty_name: perturber
datasets:
  - panda
tags:
  - counterfactual
  - perturb
  - fairness
  - nlp
  - demographic
  - diverse
  - gender
  - non-binary
  - race
  - age
metrics:
  - bleu

The Perturber

The perturber is a seq2seq controlled generation model that rewrites text along a specified demographic axis and attribute.

The perturber takes in (i) a source text snippet, (ii) a word in the snippet referring to a demographic group, and (iii) a new target demographic attribute, and generates a perturbed snippet that refers to the target demographic attribute, while preserving overall meaning.

Repository: https://github.com/facebookresearch/ResponsibleNLP/
Paper: https://aclanthology.org/2022.emnlp-main.646/
Point of Contact: rebeccaqian@meta.com, ccross@meta.com, douwe@huggingface.co, adinawilliams@meta.com
License: MIT

Model Description

The perturber is a finetuned BART model (Lewis et al., 2020) with 24 layers, 1024 hidden size, 406M parameters, and 16 attention heads. To train the perturber in the original paper, we finetune BART on PANDA using the ParlAI library.

This model release is separately trained using the HuggingFace transformers library, with the same parameters as the ParlAI model.

Uses

The perturber is intended for use by fairness researchers and engineers working on demographic debiasing applications. The perturber is a controllable generation model that given a word, target demographic attribute and input text, outputs text where the selected word and associated references are rewritten to the target demographic attribute. Control variables and the input text are separated by a token.

Examples

Below we show some example inputs and outputs for the perturber rewriting text along different demographic axes and attributes.

Model inputs follow the format [selected_word][target_attribute] <PERT_SEP> [input_text], where selected_word is a word that contains demographic information, target_attribute is a demographic attribute such as "man" or "asian", and input_text is the text sequence to rewrite.

Currently the perturber supports text rewriting along three axes and several attributes:

gender: man, woman, non-binary
race: black, white, asian, hispanic, native-american, pacific-islander
age: child, young, middle-aged, senior, adult

Gender

Input: his, woman <PERT_SEP> Jack was passionate about rock climbing and his love for the sport was infectious to all men around him.

Output: Jackie was passionate about rock climbing and her love for the sport was infectious to all men around her.

Input: Alice, man <PERT_SEP> To her girlfriend Jen, Alice was a doting mother, loving girlfriend and talented actress.

Output: To his girlfriend Jen, Alan was a doting father, loving partner and talented actor.

Input: his, non-binary <PERT_SEP> Jack was passionate about rock climbing and his love for the sport was infectious to all men around him.

Output: Jack was passionate about rock climbing and their love for the sport was infectious to all men around them.

Age

Input: child, senior <PERT_SEP> The young child is naive and his innocence must be protected at all costs.

Output: The elderly person is naive and his innocence must be protected at all costs.

Race/Ethnicity

Input: Asian, black <PERT_SEP> The Asian students association often hosted anime nights and boba events on campus.

Output: The Black students association often hosted anime nights and boba events on campus.

Bias, Risks & Limitations

Limitations of the perturber include inherent biases in demographic categorization, data sourcing and crowdsourced data collection, and the ambiguous nature of fairness and perturbability. Ambiguous instances include names, where annotators may have different preconceptions about whether they contain ethnic information. Our crowdworkers and researchers are primarily English speaking and US-based, which may introduce additional cultural biases.

For an in-depth discussion of bias, risks and limitations, see the Limitations section of our paper.

Citation

@inproceedings{qian-etal-2022-perturbation,
    title = "Perturbation Augmentation for Fairer {NLP}",
    author = "Qian, Rebecca  and
      Ross, Candace  and
      Fernandes, Jude  and
      Smith, Eric Michael  and
      Kiela, Douwe  and
      Williams, Adina",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.646",
    pages = "9496--9521",
    abstract = "Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i) language models (LMs) pre-trained on demographically perturbed corpora are typically more fair, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks, and (iii) fairness improvements do not come at the expense of performance on downstream tasks. Lastly, we discuss outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.",
}

Model Card Contact

Thanks to @Rebecca-Qian for adding this model.