lvwerra HF staff commited on
Commit
7300e19
1 Parent(s): 9e7941a

Update Space (evaluate main: 7e21410f)

Browse files
Files changed (4) hide show
  1. README.md +121 -6
  2. app.py +6 -0
  3. mase.py +140 -0
  4. requirements.txt +2 -0
README.md CHANGED
@@ -1,12 +1,127 @@
1
  ---
2
- title: Mase
3
- emoji: 🐨
4
- colorFrom: red
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 3.9
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MASE
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ Mean Absolute Scaled Error (MASE) is the mean absolute error of the forecast values, divided by the mean absolute error of the in-sample one-step naive forecast on the training set.
15
  ---
16
 
17
+ # Metric Card for MASE
18
+
19
+ ## Metric Description
20
+
21
+ Mean Absolute Scaled Error (MASE) is the mean absolute error of the forecast values, divided by the mean absolute error of the in-sample one-step naive forecast. For prediction $x_i$ and corresponding ground truth $y_i$ as well as training data $z_t$ with seasonality $p$ the metric is given by:
22
+ ![image](https://user-images.githubusercontent.com/8100/200009284-7ce4ccaa-373c-42f0-acbb-f81d52a97512.png)
23
+
24
+
25
+ This metric is:
26
+ * independent of the scale of the data;
27
+ * has predictable behavior when predicted/ground-truth data is near zero;
28
+ * symmetric;
29
+ * interpretable, as values greater than one indicate that in-sample one-step forecasts from the naïve method perform better than the forecast values under consideration.
30
+
31
+
32
+ ## How to Use
33
+
34
+ At minimum, this metric requires predictions, references and training data as inputs.
35
+
36
+ ```python
37
+ >>> mase_metric = evaluate.load("mase")
38
+ >>> predictions = [2.5, 0.0, 2, 8]
39
+ >>> references = [3, -0.5, 2, 7]
40
+ >>> training = [5, 0.5, 4, 6, 3, 5, 2]
41
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training)
42
+ ```
43
+
44
+ ### Inputs
45
+
46
+ Mandatory inputs:
47
+ - `predictions`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the estimated target values.
48
+ - `references`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the ground truth (correct) target values.
49
+ - `training`: numeric array-like of shape (`n_train_samples,`) or (`n_train_samples`, `n_outputs`), representing the in sample training data.
50
+
51
+ Optional arguments:
52
+ - `periodicity`: the seasonal periodicity of training data. The default is 1.
53
+ - `sample_weight`: numeric array-like of shape (`n_samples,`) representing sample weights. The default is `None`.
54
+ - `multioutput`: `raw_values`, `uniform_average` or numeric array-like of shape (`n_outputs,`), which defines the aggregation of multiple output values. The default value is `uniform_average`.
55
+ - `raw_values` returns a full set of errors in case of multioutput input.
56
+ - `uniform_average` means that the errors of all outputs are averaged with uniform weight.
57
+ - the array-like value defines weights used to average errors.
58
+
59
+ ### Output Values
60
+ This metric outputs a dictionary, containing the mean absolute error score, which is of type:
61
+ - `float`: if multioutput is `uniform_average` or an ndarray of weights, then the weighted average of all output errors is returned.
62
+ - numeric array-like of shape (`n_outputs,`): if multioutput is `raw_values`, then the score is returned for each output separately.
63
+
64
+ Each MASE `float` value ranges from `0.0` to `1.0`, with the best value being 0.0.
65
+
66
+ Output Example(s):
67
+ ```python
68
+ {'mase': 0.5}
69
+ ```
70
+
71
+ If `multioutput="raw_values"`:
72
+ ```python
73
+ {'mase': array([0.5, 1. ])}
74
+ ```
75
+
76
+ #### Values from Popular Papers
77
+
78
+
79
+ ### Examples
80
+
81
+ Example with the `uniform_average` config:
82
+ ```python
83
+ >>> mase_metric = evaluate.load("mase")
84
+ >>> predictions = [2.5, 0.0, 2, 8]
85
+ >>> references = [3, -0.5, 2, 7]
86
+ >>> training = [5, 0.5, 4, 6, 3, 5, 2]
87
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training)
88
+ >>> print(results)
89
+ {'mase': 0.1833...}
90
+ ```
91
+
92
+ Example with multi-dimensional lists, and the `raw_values` config:
93
+ ```python
94
+ >>> mase_metric = evaluate.load("mase", "multilist")
95
+ >>> predictions = [[0.5, 1], [-1, 1], [7, -6]]
96
+ >>> references = [[0.1, 2], [-1, 2], [8, -5]]
97
+ >>> training = [[0.5, 1], [-1, 1], [7, -6]]
98
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training)
99
+ >>> print(results)
100
+ {'mase': 0.1818...}
101
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training, multioutput='raw_values')
102
+ >>> print(results)
103
+ {'mase': array([0.1052..., 0.2857...])}
104
+ ```
105
+
106
+ ## Limitations and Bias
107
+
108
+
109
+ ## Citation(s)
110
+
111
+ ```bibtex
112
+ @article{HYNDMAN2006679,
113
+ title = {Another look at measures of forecast accuracy},
114
+ journal = {International Journal of Forecasting},
115
+ volume = {22},
116
+ number = {4},
117
+ pages = {679--688},
118
+ year = {2006},
119
+ issn = {0169-2070},
120
+ doi = {https://doi.org/10.1016/j.ijforecast.2006.03.001},
121
+ url = {https://www.sciencedirect.com/science/article/pii/S0169207006000239},
122
+ author = {Rob J. Hyndman and Anne B. Koehler},
123
+ }
124
+ ```
125
+
126
+ ## Further References
127
+ - [Mean absolute scaled error - Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_scaled_errorr)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("mase")
6
+ launch_gradio_widget(module)
mase.py ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """MASE - Mean Absolute Scaled Error Metric"""
15
+
16
+ import datasets
17
+ import numpy as np
18
+ from sklearn.metrics import mean_absolute_error
19
+
20
+ import evaluate
21
+
22
+
23
+ _CITATION = """\
24
+ @article{HYNDMAN2006679,
25
+ title = {Another look at measures of forecast accuracy},
26
+ journal = {International Journal of Forecasting},
27
+ volume = {22},
28
+ number = {4},
29
+ pages = {679--688},
30
+ year = {2006},
31
+ issn = {0169-2070},
32
+ doi = {https://doi.org/10.1016/j.ijforecast.2006.03.001},
33
+ url = {https://www.sciencedirect.com/science/article/pii/S0169207006000239},
34
+ author = {Rob J. Hyndman and Anne B. Koehler},
35
+ }
36
+ """
37
+
38
+ _DESCRIPTION = """\
39
+ Mean Absolute Scaled Error (MASE) is the mean absolute error of the forecast values, divided by the mean absolute error of the in-sample one-step naive forecast.
40
+ """
41
+
42
+
43
+ _KWARGS_DESCRIPTION = """
44
+ Args:
45
+ predictions: array-like of shape (n_samples,) or (n_samples, n_outputs)
46
+ Estimated target values.
47
+ references: array-like of shape (n_samples,) or (n_samples, n_outputs)
48
+ Ground truth (correct) target values.
49
+ training: array-like of shape (n_train_samples,) or (n_train_samples, n_outputs)
50
+ In sample training data for naive forecast.
51
+ periodicity: int, default=1
52
+ Seasonal periodicity of training data.
53
+ sample_weight: array-like of shape (n_samples,), default=None
54
+ Sample weights.
55
+ multioutput: {"raw_values", "uniform_average"} or array-like of shape (n_outputs,), default="uniform_average"
56
+ Defines aggregating of multiple output values. Array-like value defines weights used to average errors.
57
+
58
+ "raw_values" : Returns a full set of errors in case of multioutput input.
59
+
60
+ "uniform_average" : Errors of all outputs are averaged with uniform weight.
61
+
62
+ Returns:
63
+ mase : mean absolute scaled error.
64
+ If multioutput is "raw_values", then mean absolute percentage error is returned for each output separately. If multioutput is "uniform_average" or an ndarray of weights, then the weighted average of all output errors is returned.
65
+ MASE output is non-negative floating point. The best value is 0.0.
66
+ Examples:
67
+
68
+ >>> mase_metric = evaluate.load("mase")
69
+ >>> predictions = [2.5, 0.0, 2, 8, 1.25]
70
+ >>> references = [3, -0.5, 2, 7, 2]
71
+ >>> training = [5, 0.5, 4, 6, 3, 5, 2]
72
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training)
73
+ >>> print(results)
74
+ {'mase': 0.18333333333333335}
75
+
76
+ If you're using multi-dimensional lists, then set the config as follows :
77
+
78
+ >>> mase_metric = evaluate.load("mase", "multilist")
79
+ >>> predictions = [[0, 2], [-1, 2], [8, -5]]
80
+ >>> references = [[0.5, 1], [-1, 1], [7, -6]]
81
+ >>> training = [[0.5, 1], [-1, 1], [7, -6]]
82
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training)
83
+ >>> print(results)
84
+ {'mase': 0.18181818181818182}
85
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training, multioutput='raw_values')
86
+ >>> print(results)
87
+ {'mase': array([0.10526316, 0.28571429])}
88
+ >>> results = mase_metric.compute(predictions=predictions, references=references, training=training, multioutput=[0.3, 0.7])
89
+ >>> print(results)
90
+ {'mase': 0.21935483870967742}
91
+ """
92
+
93
+
94
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
95
+ class Mase(evaluate.Metric):
96
+ def _info(self):
97
+ return evaluate.MetricInfo(
98
+ description=_DESCRIPTION,
99
+ citation=_CITATION,
100
+ inputs_description=_KWARGS_DESCRIPTION,
101
+ features=datasets.Features(self._get_feature_types()),
102
+ reference_urls=["https://otexts.com/fpp3/accuracy.html#scaled-errors"],
103
+ )
104
+
105
+ def _get_feature_types(self):
106
+ if self.config_name == "multilist":
107
+ return {
108
+ "predictions": datasets.Sequence(datasets.Value("float")),
109
+ "references": datasets.Sequence(datasets.Value("float")),
110
+ }
111
+ else:
112
+ return {
113
+ "predictions": datasets.Value("float"),
114
+ "references": datasets.Value("float"),
115
+ }
116
+
117
+ def _compute(
118
+ self,
119
+ predictions,
120
+ references,
121
+ training,
122
+ periodicity=1,
123
+ sample_weight=None,
124
+ multioutput="uniform_average",
125
+ ):
126
+
127
+ y_pred_naive = training[:-periodicity]
128
+ mae_naive = mean_absolute_error(training[periodicity:], y_pred_naive, multioutput=multioutput)
129
+
130
+ mae_score = mean_absolute_error(
131
+ references,
132
+ predictions,
133
+ sample_weight=sample_weight,
134
+ multioutput=multioutput,
135
+ )
136
+
137
+ epsilon = np.finfo(np.float64).eps
138
+ mase_score = mae_score / np.maximum(mae_naive, epsilon)
139
+
140
+ return {"mase": mase_score}
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ git+https://github.com/huggingface/evaluate@7e21410f9bcff651452f188b702cc80ecd3530e6
2
+ sklearn