Honor Model Card

Freddie Zhang

Last updated: December 2022

Inspired by Model Cards for Model Reporting (Mitchell et al.), this model card provides accompanying information about the Honor model.

Model Details

Honor is a binary text classification model built with BertForSequenceClassification. This model was built to explore possibilities for zero-shot classification of texts in a wide range of domains.

Model date

December 2022

Model type

Transformer-based binary text classifier

Model version

v.1

Paper & samples

A research paper and samples are currently unavailable for this project.

Model Use

The intended direct users of Honor are untrained readers who access its capabilities via the Hugging Face Hub. Through the Hub, the model can be used by those who may not have AI development experience to efficiently and accurately detect usage of machine-generated text.

Data, Performance, and Limitations

Data

The GPT-3 training dataset is composed of text posted to the internet, or of text uploaded to the internet (e.g., books). The internet data that it has been trained on and evaluated against to date includes The Pile, a dataset compiled by Gao et. al that consists of 22 smaller, high-quality datasets combined together.

Given its training data, Honor’s performance is more representative of internet-connected populations than those steeped in verbal, non-digital culture. The internet-connected population is more representative of developed countries, wealthy, younger, and male views, and is mostly U.S.-centric. Wealthier nations and populations in developed countries show higher internet penetration.^[1] The digital gender divide also shows fewer women represented online worldwide.^[2] Additionally, because different parts of the world have different levels of internet penetration and access, the dataset underrepresents less connected communities.^[3]

Performance

Honor's performance has been evaluated and validated against detection for the 22 areas of text it has been trained on.

Limitations

Honor has a number of limitations. Some of these limitations are inherent to any model with machine learning (ML) components that can have high-bandwidth, open-ended interactions with people (e.g. via natural language): ML components have limited robustness; ML components are biased; open-ended systems have large surface areas for risk; and safety is a moving target for ML systems. Like any model with ML components, it can only be expected to provide reasonable detection when given inputs similar to the ones present in its training data.

Lack of world grounding: Honor like other large pretrained language models, is not grounded in other modalities of experience, such as video, real-world physical interaction, or human feedback, and thus lacks a large amount of context about the world.^[4]

Predominantly English: Honor is trained largely on text in the English language, and is best suited for classifying such text. Honor will by default perform worse on inputs that are different from the data distribution it is trained on, including non-English languages as well as specific dialects of English that are not as well-represented in training data.

Interpretability & predictability: the capacity to interpret or predict how Honor will behave is very limited, a limitation common to most deep learning systems.

Creation date of training corpora: The December 2022 version of GPT-3 was trained on a dataset created in December 2020, so has not been trained on any data more recent than that.

Biases: GPT-3, like all large language models trained on internet corpora, is biased. The model has the propensity to retain and magnify biases it inherited from any part of its training, from the datasets the authors selected to the training techniques they chose.^[5]

Where to send questions or comments about the model

Please use this the Community section of Hugging Face.

[1] International Telecommunication Union ( ITU ) World Telecommunication/ICT Indicators Database. "Individuals using the Internet (% of population)" https://data.worldbank.org/indicator/IT.NET.USER.ZS?end=2018&start=2002.

[2] Organisation for Economic Co-operation and Development. "Bridging the Digital Divide." http://www.oecd.org/internet/bridging-the-digital-gender-divide.pdf.

[3] Telecommunication Development Bureau. "Manual for Measuring ICT Access and Use by Households and Individuals." https://www.itu.int/pub/D-IND-ITCMEAS-2014.

[4] Bisk, Yonatan, et al. Experience Grounds Language. arXiv preprint arXiv:2004.10151, 2020.

[5] Crawford, Kate. The Trouble with Bias. NeurIPS 2017 Keynote, 2017.