Edit model card


Model description

This bert-base-uncased model has been fine-tuned on the Wiki Neutrality Corpus (WNC) - a parallel corpus of 180,000 biased and neutralized sentence pairs along with contextual sentences and metadata. The model can be used to classify text as subjectively biased vs. neutrally toned.

The development and modeling efforts that produced this model are documented in detail through this blog series.

Intended uses & limitations

The model is intended purely as a research output for NLP and data science communities. We developed this model for the purpose of evaluating text style transfer output. Specifically, we derive a Style Transfer Intensity (STI) metric from the classifier's output distributions. We also extract feautre importances from the model via Integrated Gradients with support a Content Preservation Score (CPS).

We imagine this model will be used by researchers to better understand the limitations, robustness, and generalization of text style transfer models. Ultimately, we hope this model will inspire future work on text style transfer and serve as a benchmarking tool for the style attribute of subjectivity bias, specifically.

Any production use of this model - whether commercial or not - is currently not intended. This is because, as the team at OpenAI points out, large langauge models like BERT reflect biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans, unless the deployers first carry out a study of biases relevant to the intended use-case. Neither the model nor the WNC dataset has been sufficiently evaluated for performance and bias.

As we discuss in the blog series, since the WNC is a parallel dataset and we formulate the learning task as a supervised problem, the model indirectly adopts Wikipedia's NPOV policy as the definition for "neutrality" and "subjectivity". The NPOV policy may not fully reflect an end users assumed/intended meaning of subjectivity because the notion of subjectivity itself can be...well, subjective.

We discovered through our exploratory work that the WNC does contain data quality issues that will contribute to unintended bias in the model. For example, some NPOV revisions introduce factual information outside the context of the prompt as a means to correct bias. We believe these factual based edits are out of scope for a subjective-to-neutral style transfer modeling task, but exist here nonetheless.

How to use

This model can be used directly with a HuggingFace pipeline for text2text-generation.

>>> from transformers import pipeline

>>> classify = pipeline(
>>> input_text = "chemical abstracts service (cas), a prominent division of the american chemical society, is the world's leading source of chemical information."
>>> classify(input_text)

[[{'label': 'SUBJECTIVE', 'score': 0.9765084385871887},
  {'label': 'NEUTRAL', 'score': 0.023491567000746727}]]

Training procedure

For training, we initialize HuggingFaceโ€™s AutoModelforSequenceClassification with bert-base-uncased pre-trained weights and perform a hyperparameter search over: batch size [16, 32], learning rate [3e-05, 3e-06, 3e-07], weight decay [0, 0.01, 0.1] and batch shuffling [True, False] while training for 15 epochs.

We monitor performance using accuracy as we have a perfectly balanced dataset and assign equal cost to false positives and false negatives. The best performing model produces an overall accuracy of 72.50% -- please reference our training script and classifier evaluation notebook for further details.

Downloads last month

Space using cffl/bert-base-styleclassification-subjective-neutral 1