|
Domain,Subdomain,Indicator,Definition,Notes,Reference_1,Reference_2,Link_1,Link_2
|
|
Upstream,Data,Data size,,,Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science,Datasheets for Datasets,https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00041/43452/Data-Statements-for-Natural-Language-Processing,https://arxiv.org/abs/1803.09010
|
|
Upstream,Data,Data sources,,,Datasheets for Datasets,Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2010.13561
|
|
Upstream,Data,Data creators ,,,Datasheets for Datasets,Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2010.13561
|
|
Upstream,Data,Data source selection,Are the selection protocols for including and excluding data sources disclosed?,Selection protocols refer to procedures used to choose which datasets or subsets of datasets will be used to build a model. We will award this point even if the selection protocols are non-exhaustive.,Datasheets for Datasets,Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2010.13561
|
|
Upstream,Data,Data curation,,,Datasheets for Datasets,Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2010.13561
|
|
Upstream,Data,Data augmentation,Are any steps the developer takes to augment its data sources disclosed?,Such steps might include augmenting data sources with synthetic data. We will award this point if the developer reports that it does not take any steps to augment its data.,Datasheets for Datasets,Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2010.13561
|
|
Upstream,Data,Harmful data filtration,,Such harmful content might relate to violence or child sexual abuse material. We will award this point if the developer reports that it does not perform any harmful data filtration.,Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus,,https://aclanthology.org/2021.emnlp-main.98/,https://arxiv.org/abs/2305.13169
|
|
Upstream,Data,Copyrighted data,,,,Machine Learning and Artificial Intelligence: Legal Concepts,https://arxiv.org/abs/2105.05241,https://genlaw.github.io/glossary.html#legal-concepts
|
|
Upstream,Data,Data license,,,,Machine Learning and Artificial Intelligence: Legal Concepts,https://arxiv.org/abs/2105.05241,https://genlaw.github.io/glossary.html#legal-concepts
|
|
Upstream,Data,Personal information in data,,,Data Capitalism: Redefining the Logics of Surveillance and Privacy,What Does it Mean for a Language Model to Preserve Privacy?,https://journals.sagepub.com/doi/10.1177/0007650317718185,https://arxiv.org/abs/2202.05520
|
|
Upstream,Data labor,Use of human labor,Are the phases of the data pipeline where human labor is involved disclosed?,,The future of crowd work,,https://dl.acm.org/doi/10.1145/2441776.2441923,https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
|
|
Upstream,Data labor,Employment of data laborers,Is the organization that directly employs the people involved in data labor disclosed for each phase of the data pipeline?,,The future of crowd work,,https://dl.acm.org/doi/10.1145/2441776.2441923,https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
|
|
Upstream,Data labor,Geographic distribution of data laborers,Is geographic information regarding the people involved in data labor disclosed for each phase of the data pipeline?,This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer gives a reasonable best-effort description of the geographic distribution of labor at the country-level.,Cleaning Up ChatGPT Takes Heavy Toll on Human Workers,Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass,https://www.wsj.com/articles/chatgpt-openai-content-abusive-sexually-explicit-harassment-kenya-workers-on-human-workers-cf191483,https://ghostwork.info/
|
|
Upstream,Data labor,Wages,Are the wages for people who perform data labor disclosed?,,The future of crowd work,,https://dl.acm.org/doi/10.1145/2441776.2441923,https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
|
|
Upstream,Data labor,Instructions for creating data,Are the instructions given to people who perform data labor disclosed?,This indicator is inclusive of all data that is created by or on behalf of the developer. We will award this point if the developer makes a reasonable best-effort attempt to disclose instructions given to people who create data used to build the model for the bulk of the data phases involving human labor.,,The future of crowd work,https://dl.acm.org/doi/10.1145/3411764.3445518,https://dl.acm.org/doi/10.1145/2441776.2441923
|
|
Upstream,Data labor,Labor protections,Are the labor protections for people who perform data labor disclosed?,,,Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass,https://www.jstor.org/stable/j.ctv1ghv45t,https://ghostwork.info/
|
|
Upstream,Data labor,Third party partners,Are the third parties who were or are involved in the development of the model disclosed?,This indicator is inclusive of partnerships that go beyond data labor as there may be third party partners at various stages in the model development process. We will award this point if the developer reports that it was the sole entity involved in the development of the model.,,Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass,https://www.jstor.org/stable/j.ctv1ghv45t,https://ghostwork.info/
|
|
Upstream,Data access,Queryable external data access,Are external entities provided with queryable access to the data used to build the model?,,Datasheets for Datasets,The ROOTS Search Tool: Data Transparency for LLMs,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2302.14035
|
|
Upstream,Data access,Direct external data access,Are external entities provided with direct access to the data used to build the model?,,Datasheets for Datasets,The ROOTS Search Tool: Data Transparency for LLMs,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/2302.14035
|
|
Upstream,Compute,Compute usage,Is the compute required for building the model disclosed?,,Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning,Energy and Policy Considerations for Deep Learning in NLP,https://arxiv.org/abs/2002.05651,https://arxiv.org/abs/1906.02243
|
|
Upstream,Compute,Development duration,Is the amount of time required to build the model disclosed?,,Compute Trends Across Three Eras of Machine Learning,Training Compute-Optimal Large Language Models,https://arxiv.org/abs/2202.05924,https://arxiv.org/abs/2203.15556
|
|
Upstream,Compute,Compute hardware,,,Compute Trends Across Three Eras of Machine Learning,Training Compute-Optimal Large Language Models,https://arxiv.org/abs/2202.05924,https://arxiv.org/abs/2203.15556
|
|
Upstream,Compute,Hardware owner,,,Compute Trends Across Three Eras of Machine Learning,Training Compute-Optimal Large Language Models,https://arxiv.org/abs/2202.05924,https://arxiv.org/abs/2203.15556
|
|
Upstream,Compute,Energy usage,Is the amount of energy expended in building the model disclosed?,,Quantifying the Carbon Emissions of Machine Learning,Carbon Emissions and Large Neural Network Training,https://arxiv.org/abs/1910.09700,https://arxiv.org/abs/2104.10350
|
|
Upstream,Compute,Carbon emissions,Is the amount of carbon emitted (associated with the energy used) in building the model disclosed?,,Quantifying the Carbon Emissions of Machine Learning,Carbon Emissions and Large Neural Network Training,https://arxiv.org/abs/1910.09700,https://arxiv.org/abs/2104.10350
|
|
Upstream,Compute,Broader environmental impact,Are any broader environmental impacts from building the model besides carbon emissions disclosed?,,Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning,Energy and Policy Considerations for Deep Learning in NLP,https://arxiv.org/abs/2302.08476,https://arxiv.org/abs/1906.02243
|
|
Upstream,Methods,Model stages,Are all stages in the model development process disclosed?,,Model Cards for Model Reporting,Scaling Instruction-Finetuned Language Models,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2210.11416
|
|
Upstream,Methods,Model objectives,,,Model Cards for Model Reporting,Scaling Instruction-Finetuned Language Models,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2210.11416
|
|
Upstream,Methods,Core frameworks,Are the core frameworks used for model development disclosed?,,Model Cards for Model Reporting,Scaling Instruction-Finetuned Language Models,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2210.11416
|
|
Upstream,Methods,Additional dependencies,,,Analyzing Leakage of Personally Identifiable Information in Language Models,ProPILE: Probing Privacy Leakage in Large Language Models,https://arxiv.org/abs/2302.00539,https://arxiv.org/abs/2307.01881
|
|
Upstream,Data Mitigations,Mitigations for privacy,Are any steps the developer takes to mitigate the presence of PII in the data disclosed?,,Deduplicating Training Data Mitigates Privacy Risks in Language Models,Machine Learning and Artificial Intelligence: Legal Concepts,https://proceedings.mlr.press/v162/kandpal22a.html,https://genlaw.github.io/glossary.html#legal-concepts
|
|
Upstream,Data Mitigations,Mitigations for copyright,Are any steps the developer takes to mitigate the presence of copyrighted information in the data disclosed?,,,Machine Learning and Artificial Intelligence: Legal Concepts,https://arxiv.org/abs/2105.05241,https://genlaw.github.io/glossary.html#legal-concepts
|
|
Model,Model basics,Input modality,Are the input modalities for the model disclosed?,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model basics,Output modality,Are the output modalities for the model disclosed?,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model basics,Model components,Are all components of the model disclosed?,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model basics,Model size,,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model basics,Model architecture,Is the model architecture disclosed?,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model basics,Centralized model documentation,Is key information about the model included in a centralized artifact such as a model card?,,Model Cards for Model Reporting,Interactive Model Cards: A Human-Centered Approach to Model Documentation,https://arxiv.org/abs/1810.03993,https://arxiv.org/abs/2205.02894
|
|
Model,Model access,External model access protocol,Is a protocol for granting external entities access to the model disclosed?,,The Gradient of Generative AI Release: Methods and Considerations,Structured access: an emerging paradigm for safe AI deployment,https://arxiv.org/abs/2302.04844,https://arxiv.org/abs/2201.05159
|
|
Model,Model access,Blackbox external model access,Is black box model access provided to external entities?,,The Gradient of Generative AI Release: Methods and Considerations,Structured access: an emerging paradigm for safe AI deployment,https://arxiv.org/abs/2302.04844,https://arxiv.org/abs/2201.05159
|
|
Model,Model access,Full external model access,Is full model access provided to external entities?,,The Gradient of Generative AI Release: Methods and Considerations,Structured access: an emerging paradigm for safe AI deployment,https://arxiv.org/abs/2302.04844,https://arxiv.org/abs/2201.05159
|
|
Model,Capabilities,Capabilities description,Are the model's capabilities described?,"Capabilities refer to the specific and distinctive functions that the model can perform. We recognize that different developers may use different terminology for capabilities, or conceptualize capabilities differently. We will award this point for any clear, but potentially incomplete, description of the multiple capabilities.",Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,Holistic Evaluation of Language Models,https://arxiv.org/abs/2206.04615,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Capabilities,Capabilities demonstration,Are the model’s capabilities demonstrated?,"Demonstrations refer to illustrative examples or other forms of showing the model's capabilities that are legible or understandable for the general public, without requiring specific technical expertise. We recognize that different developers may use different terminology for capabilities, or conceptualize capabilities differently. We will award this point for clear demonstrations of multiple capabilities. |
|
Are the model’s capabilities rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?Rigorous evaluations refer to precise quantifications of the model's behavior in relation to its capabilities. We recognize that capabilities may not perfectly align with evaluations, and that different developers may associate capabilities with evaluations differently. We will award this point for clear evaluations of multiple capabilities. For example, this may include evaluations of world knowledge, reasoning, state tracking or other such proficiencies. Or it may include the measurement of average performance (e.g. accuracy, F1) on benchmarks for specific tasks (e.g. text summarization, image captioning). We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's capabilities are presented as especially unusual such that standard evaluations will not suffice. |
|
For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by an external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the model developer for why it is not possible for the evaluation to be made reproducible may be sufficient to score this point.",Leakage and the reproducibility crisis in machine-learning-based science,Holistic Evaluation of Language Models,https://www.cell.com/patterns/fulltext/S2666-3899(23)00159-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389923001599%3Fshowall%3Dtrue,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Capabilities,Third party capabilities evaluation,Are the model’s capabilities evaluated by third parties?,"By third party, we mean entities that are significantly or fully independent of the developer. We will award this point if (i) a third party has conducted an evaluation of model capabilities, (ii) the results of this evaluation are publicly available, and (iii) these results are disclosed or referred to in the developer’s materials.",Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance,Holistic Evaluation of Language Models,https://dl.acm.org/doi/10.1145/3514094.3534181,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Limitations,Limitations description,Are the model's limitations disclosed?,,The Fallacy of AI Functionality,Holistic Evaluation of Language Models,https://dl.acm.org/doi/abs/10.1145/3531146.3533158,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Limitations,Limitations demonstration,Are the model’s limitations demonstrated?,,The Fallacy of AI Functionality,Holistic Evaluation of Language Models,https://dl.acm.org/doi/abs/10.1145/3531146.3533158,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Limitations,Third party evaluation of limitations,Can the model’s limitations be evaluated by third parties?,,Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance,Holistic Evaluation of Language Models,https://dl.acm.org/doi/10.1145/3514094.3534181,https://openreview.net/forum?id=iO4LZibEqW
|
|
Model,Risks,Risks description,Are the model's risks disclosed?,"Risks refer to possible negative consequences or undesirable outcomes that can arise from the model's deployment and usage. This indicator requires disclosure of risks that may arise in the event of both (i) intentional (though possibly careless) use, such as bias or hallucinations and (ii) malicious use, such as fraud or disinformation. We recognize that different developers may use different terminology for risks, or conceptualize risks differently. We will award this point for any clear, but potentially incomplete, description of multiple risks. |
|
Demonstrations refer to illustrative examples or other forms of showing the risks that are legible or understandable for the general public, without requiring specific technical expertise. This indicator requires demonstration of risks that may arise in the event of both (i) intentional (though possibly careless) use, such as biases or hallucinations and (ii) malicious use, such as fraud or disinformation. We recognize that different developers may use different terminology for risks, or conceptualize risks differently. We will award this point for clear demonstrations of multiple risks. |
|
Are the model’s risks related to unintentional harm rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?Rigorous evaluations refer to precise quantifications of the model's behavior in relation to such risks. Unintentional harms include bias, toxicity, and issues relating to fairness. We recognize that unintended harms may not perfectly align with risk evaluations, and that different developers may associate risks with evaluations differently. We will award this point for clear evaluations of multiple such risks. We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's risks related to unintentional harm are presented as especially unusual or severe. |
|
For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the developer for why it is not possible for the evaluation to be made reproducible may suffice.",Leakage and the reproducibility crisis in machine-learning-based science,Ethical and social risks of harm from Language Models,https://www.cell.com/patterns/fulltext/S2666-3899(23)00159-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389923001599%3Fshowall%3Dtrue,https://arxiv.org/abs/2112.04359
|
|
Model,Risks,Intentional harm evaluation,"Are the model’s risks related to intentional harm rigorously evaluated, with the results of these evaluations reported prior to or concurrent with the initial release of the model?.","Rigorous evaluations refer to precise quantifications of the model's behavior in relation to such risks. Intentional harms include fraud, disinformation, scams, cybersecurity attacks, designing weapons or pathogens, and uses of the model for illegal purposes. We recognize that unintentional harms may not perfectly align with risk evaluations, and that different developers may associate risks with evaluations differently. We will award this point for clear evaluations of multiple such risks. We note that evaluations on standard broad-coverage benchmarks are likely to suffice for this indicator, though they may not if the model's risks related to unintentional harm are presented as especially unusual or severe.",Evaluating the Social Impact of Generative AI Systems in Systems and Society,Ethical and social risks of harm from Language Models,https://arxiv.org/abs/2306.05949,https://arxiv.org/abs/2112.04359
|
|
Model,Risks,External reproducibility of intentional harm evaluation,Are the evaluations of the model’s risks related to intentional harm reproducible by external entities?,"For an evaluation to be reproducible by an external entity, we mean that the associated data is either (i) publicly available or (ii) described sufficiently such that a reasonable facsimile can be constructed by the external entity. In addition, the evaluation protocol should be sufficiently described such that if the evaluation is reproduced, any discrepancies with the developer's results can be resolved. We recognize that there does not exist an authoritative or consensus standard for what is required for an evaluation to be deemed externally reproducible. Evaluations on standard benchmarks are assumed to be sufficiently reproducible for the purposes of this index. We will award this point for reproducibility of multiple disclosed evaluations. In the event that an evaluation is not reproducible, a justification by the model developer for why it is not possible for the evaluation to be made reproducible may suffice. |
|
By third party, we mean entities that are significantly or fully independent of the developer. A third party risk evaluation might involve the developer allowing a third party to choose a methodology for evaluating risks that differs from that of the developer. We will award this point if (i) a third party has conducted an evaluation of model risks, (ii) the results of this evaluation are publicly available, and (iii) these results are disclosed or referred to in the developer’s materials. If the results are not made public (but are disclosed to have been conducted) and/or the results are not discoverable in the developer’s materials, we will not award this point. We may accept a justification from either the third party or the developer for why part of the evaluation is not disclosed in relation to risks. |
|
By model mitigations, we refer to interventions implemented by the developer at the level of the model to reduce the likelihood and/or the severity of the model’s risks. We recognize that different developers may use different terminology for mitigations, or conceptualize mitigations differently. We will award this point for any clear, but potentially incomplete, description of multiple mitigations associated with the model's risks. Alternatively, we will award this point if the developer reports that it does not mitigate risk.",Evaluating the Social Impact of Generative AI Systems in Systems and Society,Ethical and social risks of harm from Language Models,https://arxiv.org/abs/2306.05949,https://arxiv.org/abs/2112.04359
|
|
Model,Model Mitigations,Mitigations demonstration,Are the model mitigations demonstrated?,"Demonstrations refer to illustrative examples or other forms of showing the mitigations that are legible or understandable for the general public, without requiring specific technical expertise. We recognize that different developers may use different terminology for mitigations, or conceptualize mitigations differently. We will award this point for clear demonstrations of multiple mitigations. We will also award this point if the developer reports that it does not mitigate the risks associated with the model.",Evaluating the Social Impact of Generative AI Systems in Systems and Society,Ethical and social risks of harm from Language Models,https://arxiv.org/abs/2306.05949,https://arxiv.org/abs/2112.04359
|
|
Model,Model Mitigations,Mitigations evaluation,"Are the model mitigations rigorously evaluated, with the results of these evaluations reported?",Rigorous evaluations refer to precise quantifications of the model's behavior in relation to the mitigations associated with its risks. We will award this point for clear evaluations of multiple mitigations.,Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation,Ethical and social risks of harm from Language Models,https://arxiv.org/abs/2310.06987,https://arxiv.org/abs/2112.04359
|
|
Model,Model Mitigations,External reproducibility of mitigations evaluation,Are the model mitigation evaluations reproducible by external entities?,,Leakage and the reproducibility crisis in machine-learning-based science,Ethical and social risks of harm from Language Models,https://www.cell.com/patterns/fulltext/S2666-3899(23)00159-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389923001599%3Fshowall%3Dtrue,https://arxiv.org/abs/2112.04359
|
|
Model,Model Mitigations,Third party mitigations evaluation,Can the model mitigations be evaluated by third parties?,,Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance,Ethical and social risks of harm from Language Models,https://dl.acm.org/doi/10.1145/3514094.3534181,https://arxiv.org/abs/2112.04359
|
|
Model,Trustworthiness,Trustworthiness evaluation,,,Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims,DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models,https://arxiv.org/abs/2004.07213,https://arxiv.org/abs/2306.11698
|
|
Model,Trustworthiness,External reproducibility of trustworthiness evaluation,Are the trustworthiness evaluations reproducible by external entities?,,Leakage and the reproducibility crisis in machine-learning-based science,,https://www.cell.com/patterns/fulltext/S2666-3899(23)00159-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389923001599%3Fshowall%3Dtrue,https://dl.acm.org/doi/10.1145/3419764
|
|
Model,Inference,Inference duration evaluation,Is the time required for model inference disclosed for a clearly-specified task on a clearly-specified set of hardware?,,MLPerf Inference Benchmark,Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs,https://arxiv.org/abs/1911.02549,https://arxiv.org/abs/2305.02440
|
|
Model,Inference,Inference compute evaluation,Is the compute usage for model inference disclosed for a clearly-specified task on a clearly-specified set of hardware?,,MLPerf Inference Benchmark,Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs,https://arxiv.org/abs/1911.02549,https://arxiv.org/abs/2305.02440
|
|
Downstream,Distribution,Release decision-making,Is the developer’s protocol for deciding whether or not to release a model disclosed?,,The Gradient of Generative AI Release: Methods and Considerations,The Time Is Now to Develop Community Norms for the Release of Foundation Models,https://arxiv.org/abs/2302.04844,https://hai.stanford.edu/news/time-now-develop-community-norms-release-foundation-models
|
|
Downstream,Distribution,Release process,Is a description of the process of how the model was released disclosed?,,The Gradient of Generative AI Release: Methods and Considerations,The Time Is Now to Develop Community Norms for the Release of Foundation Models,https://arxiv.org/abs/2302.04844,https://hai.stanford.edu/news/time-now-develop-community-norms-release-foundation-models
|
|
Downstream,Distribution,Distribution channels,Are all distribution channels disclosed?,,Understanding accountability in algorithmic supply chains,Thinking Upstream: Ethics and Policy Opportunities in AI Supply Chains,https://dl.acm.org/doi/10.1145/3593013.3594073,https://arxiv.org/abs/2303.07529
|
|
Downstream,Distribution,Products and services,Does the developer disclose whether any products and services offered by the developer are dependent on the model?,We recognize that a developer may provide many products and services that depend on a foundation model or internal derivatives of the model. We will award this point for a reasonable best-effort description of any ways the developer makes internal use of the model in its products or services.,Understanding accountability in algorithmic supply chains,On AI Deployment: AI supply chains (and why they matter),https://dl.acm.org/doi/10.1145/3593013.3594073,https://aipolicy.substack.com/p/supply-chains-2
|
|
Downstream,Distribution,Detection of machine-generated content,Are any mechanisms for detecting content generated by this model disclosed?,,A Watermark for Large Language Models,Robust Distortion-free Watermarks for Language Models,https://arxiv.org/abs/2301.10226,https://www.semanticscholar.org/paper/Robust-Distortion-free-Watermarks-for-Language-Kuditipudi-Thickstun/ccaff61e0c1e629d91d78f82a64b3cbc8f3f7023
|
|
Downstream,Distribution,Model License,Is a license for the model disclosed?,,,An investigation of licensing of datasets for machine learning based on the GQM model,https://arxiv.org/abs/2305.18615,https://arxiv.org/abs/2303.13735
|
|
Downstream,Distribution,Terms of service,Are terms of service disclosed for each distribution channel?,We will award this point if there are terms-of-service that appear to apply to the bulk of the model’s distribution channels.,Terms-we-Serve-with: a feminist-inspired social imaginary for improved transparency and engagement in AI,Identifying Terms and Conditions Important to Consumers using Crowdsourcing,https://arxiv.org/abs/2206.02492,https://arxiv.org/abs/2111.12182
|
|
Downstream,Usage policy,Permitted and prohibited users,Is a description of who can and cannot use the model disclosed?,,Best Practices for Deploying Language Models,Meta Platform Terms,https://txt.cohere.com/best-practices-for-deploying-language-models/,https://developers.facebook.com/terms/#datause
|
|
Downstream,Usage policy,,,,Best Practices for Deploying Language Models,Meta Platform Terms,https://txt.cohere.com/best-practices-for-deploying-language-models/,https://developers.facebook.com/terms/#datause
|
|
Downstream,Usage policy,Usage policy enforcement,Is the enforcement protocol for the usage policy disclosed?,,Best Practices for Deploying Language Models,Meta Platform Terms,https://txt.cohere.com/best-practices-for-deploying-language-models/,https://developers.facebook.com/terms/#datause
|
|
Downstream,Usage policy,Justification for enforcement action,Do users receive a justification when they are subject to an enforcement action for violating the usage policy?,,Best Practices for Deploying Language Models,Meta Platform Terms,https://txt.cohere.com/best-practices-for-deploying-language-models/,https://developers.facebook.com/terms/#datause
|
|
Downstream,Usage policy,Usage policy violation appeals mechanism,Is a mechanism for appealing potential usage policy violations disclosed?,,Best Practices for Deploying Language Models,Meta Platform Terms,https://txt.cohere.com/best-practices-for-deploying-language-models/,https://developers.facebook.com/terms/#datause
|
|
Downstream,Model behavior policy,,,,I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models,,https://arxiv.org/abs/2306.03423,https://arxiv.org/abs/2310.03693
|
|
Downstream,Model behavior policy,Model behavior policy enforcement,Is the enforcement protocol for the model behavior policy disclosed?,,Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims,,https://arxiv.org/abs/2004.07213,https://arxiv.org/abs/2310.03693
|
|
Downstream,Model behavior policy,Interoperability of usage and model behavior policies,Is the way that the usage policy and the model behavior policy interoperate disclosed?,,I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models,,https://arxiv.org/abs/2306.03423,https://arxiv.org/abs/2310.03693
|
|
Downstream,User Interface,User interaction with AI system,,,Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges,Towards Responsible AI: A Design Space Exploration of Human-Centered Artificial Intelligence User Interfaces to Investigate Fairness,https://dl.acm.org/doi/10.1145/3544548.3581278,https://arxiv.org/abs/2206.00474
|
|
Downstream,User Interface,Usage disclaimers,,,Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges,Towards Responsible AI: A Design Space Exploration of Human-Centered Artificial Intelligence User Interfaces to Investigate Fairness,https://dl.acm.org/doi/10.1145/3544548.3581278,https://arxiv.org/abs/2206.00474
|
|
Downstream,User data protection,User data protection policy,,We will also award this point if the developer reports that it has no user data protection policy.,Privacy as Contextual Integrity,Redesigning Data Privacy: Reimagining Notice & Consent for human technology interaction,https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10/,https://www.weforum.org/reports/redesigning-data-privacy-reimagining-notice-consent-for-humantechnology-interaction/
|
|
Downstream,User data protection,Permitted and prohibited use of user data,Are permitted and prohibited uses of user data disclosed?,,Privacy as Contextual Integrity,Redesigning Data Privacy: Reimagining Notice & Consent for human technology interaction,https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10/,https://www.weforum.org/reports/redesigning-data-privacy-reimagining-notice-consent-for-humantechnology-interaction/
|
|
Downstream,User data protection,Usage data access protocol,Is a protocol for granting external entities access to usage data disclosed?,,How Cambridge Analytica Sparked the Great Privacy Awakening,Redesigning Data Privacy: Reimagining Notice & Consent for human technology interaction,https://www.wired.com/story/cambridge-analytica-facebook-privacy-awakening/,https://www.weforum.org/reports/redesigning-data-privacy-reimagining-notice-consent-for-humantechnology-interaction/
|
|
Downstream,Model Updates,Versioning protocol,Is there a disclosed version and versioning protocol for the model?,,How is ChatGPT's behavior changing over time?,Putting the Semantics into Semantic Versioning,https://arxiv.org/abs/2307.09009,https://arxiv.org/abs/2008.07069
|
|
Downstream,Model Updates,Change log,Is there a disclosed change log for the model?,"By change log, we mean a description associated with each change to the model (which should be indicated by a change in version number). We recognize that different developers may adopt different practices for change logs that may differ from practices used elsewhere in software engineering. We will award this point if the change log provides a clear description of changes that is legible to a technical audience.",How is ChatGPT's behavior changing over time?,Watch out for This Commit! A Study of Influential Software Changes,https://arxiv.org/abs/2307.09009,https://arxiv.org/abs/1606.03266
|
|
Downstream,Model Updates,Deprecation policy,Is there a disclosed deprecation policy for the developer?,,How is ChatGPT's behavior changing over time?,Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example,https://arxiv.org/abs/2307.09009,https://arxiv.org/abs/2005.13220
|
|
Downstream,Feedback,Feedback mechanism,Is a feedback mechanism disclosed?,"By feedback mechanism, we refer to a means for external entities to report feedback or issues that arise in relation to the foundation model. Such entities may include but are not necessarily limited to users. We will award this point if the developer discloses a feedback mechanism that has been implemented.",Ecosystem Graphs: The Social Footprint of Foundation Models,Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985,https://dl.acm.org/doi/10.1145/3514094.3534181
|
|
Downstream,Feedback,Feedback summary,"Is a report or summary disclosed regarding the feedback the developer received or, alternatively, the way the developer responded to that feedback?","We recognize that there does not exist an authoritative or consensus standard for what is required in a feedback report. For this reason, we will award this point if there is a meaningful, though potentially vague or incomplete, summary of feedback received.",Achieving Transparency Report Privacy in Linear Time,Evaluating a Methodology for Increasing AI Transparency: A Case Study,https://arxiv.org/abs/2104.00137,https://arxiv.org/abs/2201.13224
|
|
Downstream,Feedback,Government inquiries,Is a summary of government inquiries related to the model received by the developer disclosed?,"Such government inquiries might include requests for user data, requests that certain content be banned, or requests for information about a developer’s business practices. We recognize that there does not exist an authoritative or consensus standard for what is required for such a summary of government inquiries. For this reason, we will award this point if (i) there is a meaningful, though potentially vague or incomplete, summary of government inquiries, or (ii) a summary of government inquiries related to user data.",Transparency Report: Government requests on the rise,Ecosystem Graphs: The Social Footprint of Foundation Models,https://blog.google/technology/safety-security/transparency-report-government-requests/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Monitoring mechanism,"For each distribution channel, is a monitoring mechanism for tracking model use disclosed?","By monitoring mechanism, we refer to a specific protocol for tracking model use that goes beyond an acknowledgement that usage data is collected. We will also award this point for a reasonable best-effort attempt to describe monitoring mechanisms, or if a developer discloses that a distribution channel is not monitored.",Progressive Disclosure: Designing for Effective Transparency,Ecosystem Graphs: The Social Footprint of Foundation Models,https://arxiv.org/abs/1811.02164,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Downstream applications,"Across all forms of downstream use, is the number of applications dependent on the foundation model disclosed?","We recognize that there does not exist an authoritative or consensus standard for what qualifies as an application. We will award this point if there is a meaningful estimate of the number of downstream applications, along with some description of what it means for an application to be dependent on the model.",Market concentration implications of foundation models: The Invisible Hand of ChatGPT,Ecosystem Graphs: The Social Footprint of Foundation Models,https://www.brookings.edu/articles/market-concentration-implications-of-foundation-models-the-invisible-hand-of-chatgpt/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Affected market sectors,"Across all downstream applications, is the fraction of applications corresponding to each market sector disclosed?","By market sector, we refer to an identifiable part of the economy. While established standards exist for describing market sectors, we recognize that developers may provide vague or informal characterizations of market impact. We will award this point if there is a meaningful, though potentially vague or incomplete, summary of affected market sectors.",Market concentration implications of foundation models: The Invisible Hand of ChatGPT,Ecosystem Graphs: The Social Footprint of Foundation Models,https://www.brookings.edu/articles/market-concentration-implications-of-foundation-models-the-invisible-hand-of-chatgpt/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Affected individuals,"Across all forms of downstream use, is the number of individuals affected by the foundation model disclosed?","By affected individuals, we principally mean the number of potential users of applications. We recognize that there does not exist an authoritative or consensus standard for what qualifies as an affected individual. We will award this point if there is a meaningful estimate of the number of affected individuals along with a clear description of what it means for an individual to be affected by the model.",Market concentration implications of foundation models: The Invisible Hand of ChatGPT,Ecosystem Graphs: The Social Footprint of Foundation Models,https://www.brookings.edu/articles/market-concentration-implications-of-foundation-models-the-invisible-hand-of-chatgpt/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Usage reports,Is a usage report that gives usage statistics describing the impact of the model on users disclosed?,"We recognize that there does not exist an authoritative or consensus standard for what is required in a usage report. Usage statistics might include, for example, a description of the major categories of harm that has been caused by use of the model. We will award this point if there is a meaningful, though potentially vague or incomplete, summary of usage statistics.",Expert explainer: Allocating accountability in AI supply chains,Ecosystem Graphs: The Social Footprint of Foundation Models,https://www.adalovelaceinstitute.org/resource/ai-supply-chains/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Geographic statistics,"Across all forms of downstream use, are statistics of model usage across geographies disclosed?","We will award this point if there is a meaningful, though potentially incomplete or vague, disclosure of geographic usage statistics at the country-level.",Expert explainer: Allocating accountability in AI supply chains,Ecosystem Graphs: The Social Footprint of Foundation Models,https://www.adalovelaceinstitute.org/resource/ai-supply-chains/,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Impact,Redress mechanism,Is any mechanism to provide redress to users for harm disclosed?,We will also award this point if the developer reports it does not have any such redress mechanism.,Computational Power and AI,Ecosystem Graphs: The Social Footprint of Foundation Models,https://ainowinstitute.org/publication/policy/compute-and-ai,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985
|
|
Downstream,Documentation for Deployers,Centralized documentation for downstream use,Is documentation for downstream use centralized in a centralized artifact?,"Centralized documentation for downstream use refers to an artifact, or closely-linked artifacts, that consolidate relevant information for making use of or repurposing the model. Examples of these kinds of artifacts include a website with dedicated documentation information, a github repository with dedicated documentation information, and an ecosystem card. We recognize that different developers may take different approaches to centralizing information. We will award this point if there is a clearly-identified artifact(s) that contains the majority of substantive information (e.g. capabilities, limitations, risks, evaluations, distribution channels, model license, usage policies, model behavior policies, feedback and redress mechanisms, dependencies).",Datasheets for Datasets,Model Cards for Model Reporting,https://arxiv.org/abs/1803.09010,https://arxiv.org/abs/1810.03993
|
|
Downstream,Documentation for Deployers,Documentation for responsible downstream use,Is documentation for responsible downstream use disclosed?,"Such documentation might include details on how to adjust API settings to promote responsible use, descriptions of how to implement mitigations, or guidelines for responsible use. We will also award this point if the developer states that it does not provide any such documentation. For example, the developer might state that the model is offered as is and downstream developers are accountable for using the model responsibly.",Ecosystem Graphs: The Social Footprint of Foundation Models,Expert explainer: Allocating accountability in AI supply chains,https://www.semanticscholar.org/paper/Ecosystem-Graphs%3A-The-Social-Footprint-of-Models-Bommasani-Soylu/8ed7c9ba7cdb33e816135381ca502ace649c7985,https://www.adalovelaceinstitute.org/resource/ai-supply-chains/ |