evaluate-bot commited on
Commit
7332944
1 Parent(s): 5b3c908

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +103 -4
  2. accuracy.py +106 -0
  3. app.py +6 -0
  4. requirements.txt +4 -0
README.md CHANGED
@@ -1,12 +1,111 @@
1
  ---
2
  title: Accuracy
3
- emoji: 🏢
4
- colorFrom: yellow
5
- colorTo: green
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Accuracy
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for Accuracy
16
+
17
+
18
+ ## Metric Description
19
+
20
+ Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with:
21
+ Accuracy = (TP + TN) / (TP + TN + FP + FN)
22
+ Where:
23
+ TP: True positive
24
+ TN: True negative
25
+ FP: False positive
26
+ FN: False negative
27
+
28
+
29
+ ## How to Use
30
+
31
+ At minimum, this metric requires predictions and references as inputs.
32
+
33
+ ```python
34
+ >>> accuracy_metric = evaluate.load("accuracy")
35
+ >>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
36
+ >>> print(results)
37
+ {'accuracy': 1.0}
38
+ ```
39
+
40
+
41
+ ### Inputs
42
+ - **predictions** (`list` of `int`): Predicted labels.
43
+ - **references** (`list` of `int`): Ground truth labels.
44
+ - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
45
+ - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
46
+
47
+
48
+ ### Output Values
49
+ - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
50
+
51
+ Output Example(s):
52
+ ```python
53
+ {'accuracy': 1.0}
54
+ ```
55
+
56
+ This metric outputs a dictionary, containing the accuracy score.
57
+
58
+
59
+ #### Values from Popular Papers
60
+
61
+ Top-1 or top-5 accuracy is often used to report performance on supervised classification tasks such as image classification (e.g. on [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet)) or sentiment analysis (e.g. on [IMDB](https://paperswithcode.com/sota/text-classification-on-imdb)).
62
+
63
+
64
+ ### Examples
65
+
66
+ Example 1-A simple example
67
+ ```python
68
+ >>> accuracy_metric = evaluate.load("accuracy")
69
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
70
+ >>> print(results)
71
+ {'accuracy': 0.5}
72
+ ```
73
+
74
+ Example 2-The same as Example 1, except with `normalize` set to `False`.
75
+ ```python
76
+ >>> accuracy_metric = evaluate.load("accuracy")
77
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
78
+ >>> print(results)
79
+ {'accuracy': 3.0}
80
+ ```
81
+
82
+ Example 3-The same as Example 1, except with `sample_weight` set.
83
+ ```python
84
+ >>> accuracy_metric = evaluate.load("accuracy")
85
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
86
+ >>> print(results)
87
+ {'accuracy': 0.8778625954198473}
88
+ ```
89
+
90
+
91
+ ## Limitations and Bias
92
+ This metric can be easily misleading, especially in the case of unbalanced classes. For example, a high accuracy might be because a model is doing well, but if the data is unbalanced, it might also be because the model is only accurately labeling the high-frequency class. In such cases, a more detailed analysis of the model's behavior, or the use of a different metric entirely, is necessary to determine how well the model is actually performing.
93
+
94
+
95
+ ## Citation(s)
96
+ ```bibtex
97
+ @article{scikit-learn,
98
+ title={Scikit-learn: Machine Learning in {P}ython},
99
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
100
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
101
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
102
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
103
+ journal={Journal of Machine Learning Research},
104
+ volume={12},
105
+ pages={2825--2830},
106
+ year={2011}
107
+ }
108
+ ```
109
+
110
+
111
+ ## Further References
accuracy.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Accuracy metric."""
15
+
16
+ import datasets
17
+ from sklearn.metrics import accuracy_score
18
+
19
+ import evaluate
20
+
21
+
22
+ _DESCRIPTION = """
23
+ Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with:
24
+ Accuracy = (TP + TN) / (TP + TN + FP + FN)
25
+ Where:
26
+ TP: True positive
27
+ TN: True negative
28
+ FP: False positive
29
+ FN: False negative
30
+ """
31
+
32
+
33
+ _KWARGS_DESCRIPTION = """
34
+ Args:
35
+ predictions (`list` of `int`): Predicted labels.
36
+ references (`list` of `int`): Ground truth labels.
37
+ normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
38
+ sample_weight (`list` of `float`): Sample weights Defaults to None.
39
+
40
+ Returns:
41
+ accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
42
+
43
+ Examples:
44
+
45
+ Example 1-A simple example
46
+ >>> accuracy_metric = evaluate.load("accuracy")
47
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
48
+ >>> print(results)
49
+ {'accuracy': 0.5}
50
+
51
+ Example 2-The same as Example 1, except with `normalize` set to `False`.
52
+ >>> accuracy_metric = evaluate.load("accuracy")
53
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
54
+ >>> print(results)
55
+ {'accuracy': 3.0}
56
+
57
+ Example 3-The same as Example 1, except with `sample_weight` set.
58
+ >>> accuracy_metric = evaluate.load("accuracy")
59
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
60
+ >>> print(results)
61
+ {'accuracy': 0.8778625954198473}
62
+ """
63
+
64
+
65
+ _CITATION = """
66
+ @article{scikit-learn,
67
+ title={Scikit-learn: Machine Learning in {P}ython},
68
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
69
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
70
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
71
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
72
+ journal={Journal of Machine Learning Research},
73
+ volume={12},
74
+ pages={2825--2830},
75
+ year={2011}
76
+ }
77
+ """
78
+
79
+
80
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
81
+ class Accuracy(evaluate.EvaluationModule):
82
+ def _info(self):
83
+ return evaluate.EvaluationModuleInfo(
84
+ description=_DESCRIPTION,
85
+ citation=_CITATION,
86
+ inputs_description=_KWARGS_DESCRIPTION,
87
+ features=datasets.Features(
88
+ {
89
+ "predictions": datasets.Sequence(datasets.Value("int32")),
90
+ "references": datasets.Sequence(datasets.Value("int32")),
91
+ }
92
+ if self.config_name == "multilabel"
93
+ else {
94
+ "predictions": datasets.Value("int32"),
95
+ "references": datasets.Value("int32"),
96
+ }
97
+ ),
98
+ reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html"],
99
+ )
100
+
101
+ def _compute(self, predictions, references, normalize=True, sample_weight=None):
102
+ return {
103
+ "accuracy": float(
104
+ accuracy_score(references, predictions, normalize=normalize, sample_weight=sample_weight)
105
+ )
106
+ }
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("accuracy")
6
+ launch_gradio_widget(module)
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ sklearn
4
+ datasets~=2.0