lvwerra HF staff commited on
Commit
636a855
1 Parent(s): d5130c6

Update Space (evaluate main: 2dfe5d9e)

Browse files
Files changed (4) hide show
  1. README.md +99 -5
  2. app.py +6 -0
  3. brier_score.py +134 -0
  4. requirements.txt +2 -0
README.md CHANGED
@@ -1,12 +1,106 @@
1
  ---
2
  title: Brier Score
3
- emoji: 🏃
4
- colorFrom: pink
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 3.1.7
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Brier Score
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ The Brier score is a measure of the error between two probability distributions.
15
  ---
16
 
17
+ # Metric Card for Brier Score
18
+
19
+
20
+ ## Metric Description
21
+ Brier score is a type of evaluation metric for classification tasks, where you predict outcomes such as win/lose, spam/ham, click/no-click etc.
22
+ `BrierScore = 1/N * sum( (p_i - o_i)^2 )`
23
+
24
+ Where `p_i` is the prediction probability of occurrence of the event, and the term `o_i` is equal to 1 if the event occurred and 0 if not. Which means: the lower the value of this score, the better the prediction.
25
+ ## How to Use
26
+
27
+ At minimum, this metric requires predictions and references as inputs.
28
+
29
+ ```python
30
+ >>> brier_score = evaluate.load("brier_score")
31
+ >>> predictions = np.array([0, 0, 1, 1])
32
+ >>> references = np.array([0.1, 0.9, 0.8, 0.3])
33
+ >>> results = brier_score.compute(predictions=predictions, references=references)
34
+ ```
35
+
36
+ ### Inputs
37
+
38
+ Mandatory inputs:
39
+ - `predictions`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the estimated target values.
40
+
41
+ - `references`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the ground truth (correct) target values.
42
+
43
+ Optional arguments:
44
+ - `sample_weight`: numeric array-like of shape (`n_samples,`) representing sample weights. The default is `None`.
45
+ - `pos_label`: the label of the positive class. The default is `1`.
46
+
47
+
48
+ ### Output Values
49
+ This metric returns a dictionary with the following keys:
50
+ - `brier_score (float)`: the computed Brier score.
51
+
52
+
53
+ Output Example(s):
54
+ ```python
55
+ {'brier_score': 0.5}
56
+ ```
57
+
58
+ #### Values from Popular Papers
59
+
60
+
61
+ ### Examples
62
+ ```python
63
+ >>> brier_score = evaluate.load("brier_score")
64
+ >>> predictions = np.array([0, 0, 1, 1])
65
+ >>> references = np.array([0.1, 0.9, 0.8, 0.3])
66
+ >>> results = brier_score.compute(predictions=predictions, references=references)
67
+ >>> print(results)
68
+ {'brier_score': 0.3375}
69
+ ```
70
+ Example with `y_true` contains string, an error will be raised and `pos_label` should be explicitly specified.
71
+ ```python
72
+ >>> brier_score_metric = evaluate.load("brier_score")
73
+ >>> predictions = np.array(["spam", "ham", "ham", "spam"])
74
+ >>> references = np.array([0.1, 0.9, 0.8, 0.3])
75
+ >>> results = brier_score.compute(predictions, references, pos_label="ham")
76
+ >>> print(results)
77
+ {'brier_score': 0.0374}
78
+ ```
79
+ ## Limitations and Bias
80
+ The [brier_score](https://huggingface.co/metrics/brier_score) is appropriate for binary and categorical outcomes that can be structured as true or false, but it is inappropriate for ordinal variables which can take on three or more values.
81
+ ## Citation(s)
82
+ ```bibtex
83
+ @article{scikit-learn,
84
+ title={Scikit-learn: Machine Learning in {P}ython},
85
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
86
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
87
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
88
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
89
+ journal={Journal of Machine Learning Research},
90
+ volume={12},
91
+ pages={2825--2830},
92
+ year={2011}
93
+ }
94
+
95
+ @Article{brier1950verification,
96
+ title={Verification of forecasts expressed in terms of probability},
97
+ author={Brier, Glenn W and others},
98
+ journal={Monthly weather review},
99
+ volume={78},
100
+ number={1},
101
+ pages={1--3},
102
+ year={1950}
103
+ }
104
+ ```
105
+ ## Further References
106
+ - [Brier Score - Wikipedia](https://en.wikipedia.org/wiki/Brier_score)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("brier_score")
6
+ launch_gradio_widget(module)
brier_score.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Brier Score Metric"""
15
+
16
+ import datasets
17
+ from sklearn.metrics import brier_score_loss
18
+
19
+ import evaluate
20
+
21
+
22
+ _CITATION = """\
23
+ @article{scikit-learn,
24
+ title={Scikit-learn: Machine Learning in {P}ython},
25
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
26
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
27
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
28
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
29
+ journal={Journal of Machine Learning Research},
30
+ volume={12},
31
+ pages={2825--2830},
32
+ year={2011}
33
+ }
34
+ """
35
+
36
+ _DESCRIPTION = """\
37
+ Brier score is a type of evaluation metric for classification tasks, where you predict outcomes such as win/lose, spam/ham, click/no-click etc.
38
+ `BrierScore = 1/N * sum( (p_i - o_i)^2 )`
39
+ """
40
+
41
+
42
+ _KWARGS_DESCRIPTION = """
43
+ Args:
44
+ y_true : array of shape (n_samples,)
45
+ True targets.
46
+
47
+ y_prob : array of shape (n_samples,)
48
+ Probabilities of the positive class.
49
+
50
+ sample_weight : array-like of shape (n_samples,), default=None
51
+ Sample weights.
52
+
53
+ pos_label : int or str, default=None
54
+ Label of the positive class. `pos_label` will be inferred in the
55
+ following manner:
56
+
57
+ * if `y_true` in {-1, 1} or {0, 1}, `pos_label` defaults to 1;
58
+ * else if `y_true` contains string, an error will be raised and
59
+ `pos_label` should be explicitly specified;
60
+ * otherwise, `pos_label` defaults to the greater label,
61
+ i.e. `np.unique(y_true)[-1]`.
62
+
63
+ Returns
64
+ score : float
65
+ Brier score loss.
66
+ Examples:
67
+ Example-1: if y_true in {-1, 1} or {0, 1}, pos_label defaults to 1.
68
+ >>> import numpy as np
69
+ >>> brier_score = evaluate.load("brier_score")
70
+ >>> references = np.array([0, 0, 1, 1])
71
+ >>> predictions = np.array([0.1, 0.9, 0.8, 0.3])
72
+ >>> results = brier_score.compute(references=references, predictions=predictions)
73
+ >>> print(round(results["brier_score"], 4))
74
+ 0.3375
75
+
76
+ Example-2: if y_true contains string, an error will be raised and pos_label should be explicitly specified.
77
+ >>> import numpy as np
78
+ >>> brier_score = evaluate.load("brier_score")
79
+ >>> references = np.array(["spam", "ham", "ham", "spam"])
80
+ >>> predictions = np.array([0.1, 0.9, 0.8, 0.3])
81
+ >>> results = brier_score.compute(references=references, predictions=predictions, pos_label="ham")
82
+ >>> print(round(results["brier_score"], 4))
83
+ 0.0375
84
+ """
85
+
86
+
87
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
88
+ class BrierScore(evaluate.Metric):
89
+ def _info(self):
90
+ return evaluate.MetricInfo(
91
+ description=_DESCRIPTION,
92
+ citation=_CITATION,
93
+ inputs_description=_KWARGS_DESCRIPTION,
94
+ features=self._get_feature_types(),
95
+ reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html"],
96
+ )
97
+
98
+ def _get_feature_types(self):
99
+ if self.config_name == "multilist":
100
+ return [
101
+ datasets.Features(
102
+ {
103
+ "references": datasets.Sequence(datasets.Value("float")),
104
+ "predictions": datasets.Sequence(datasets.Value("float")),
105
+ }
106
+ ),
107
+ datasets.Features(
108
+ {
109
+ "references": datasets.Sequence(datasets.Value("string")),
110
+ "predictions": datasets.Sequence(datasets.Value("float")),
111
+ }
112
+ ),
113
+ ]
114
+ else:
115
+ return [
116
+ datasets.Features(
117
+ {
118
+ "references": datasets.Value("float"),
119
+ "predictions": datasets.Value("float"),
120
+ }
121
+ ),
122
+ datasets.Features(
123
+ {
124
+ "references": datasets.Value("string"),
125
+ "predictions": datasets.Value("float"),
126
+ }
127
+ ),
128
+ ]
129
+
130
+ def _compute(self, references, predictions, sample_weight=None, pos_label=1):
131
+
132
+ brier_score = brier_score_loss(references, predictions, sample_weight=sample_weight, pos_label=pos_label)
133
+
134
+ return {"brier_score": brier_score}
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ git+https://github.com/huggingface/evaluate@2dfe5d9e9d7373e48c82d19930a80559ea8cc4af
2
+ sklearn