accuracy / README.md
lvwerra's picture
lvwerra HF staff
Update Space (evaluate main: 05209ece)
fc10e66
metadata
title: Accuracy
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: >-
  Accuracy is the proportion of correct predictions among the total number of
  cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP
  + FN) Where: TP: True positive TN: True negative FP: False positive FN: False
  negative

Metric Card for Accuracy

Metric Description

Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP + FN) Where: TP: True positive TN: True negative FP: False positive FN: False negative

How to Use

At minimum, this metric requires predictions and references as inputs.

>>> accuracy_metric = evaluate.load("accuracy")
>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
>>> print(results)
{'accuracy': 1.0}

Inputs

  • predictions (list of int): Predicted labels.
  • references (list of int): Ground truth labels.
  • normalize (boolean): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
  • sample_weight (list of float): Sample weights Defaults to None.

Output Values

  • accuracy(float or int): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if normalize is set to True.. A higher score means higher accuracy.

Output Example(s):

{'accuracy': 1.0}

This metric outputs a dictionary, containing the accuracy score.

Values from Popular Papers

Top-1 or top-5 accuracy is often used to report performance on supervised classification tasks such as image classification (e.g. on ImageNet) or sentiment analysis (e.g. on IMDB).

Examples

Example 1-A simple example

>>> accuracy_metric = evaluate.load("accuracy")
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
>>> print(results)
{'accuracy': 0.5}

Example 2-The same as Example 1, except with normalize set to False.

>>> accuracy_metric = evaluate.load("accuracy")
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
>>> print(results)
{'accuracy': 3.0}

Example 3-The same as Example 1, except with sample_weight set.

>>> accuracy_metric = evaluate.load("accuracy")
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
>>> print(results)
{'accuracy': 0.8778625954198473}

Limitations and Bias

This metric can be easily misleading, especially in the case of unbalanced classes. For example, a high accuracy might be because a model is doing well, but if the data is unbalanced, it might also be because the model is only accurately labeling the high-frequency class. In such cases, a more detailed analysis of the model's behavior, or the use of a different metric entirely, is necessary to determine how well the model is actually performing.

Citation(s)

@article{scikit-learn,
  title={Scikit-learn: Machine Learning in {P}ython},
  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  journal={Journal of Machine Learning Research},
  volume={12},
  pages={2825--2830},
  year={2011}
}

Further References