lvwerra HF staff commited on
Commit
e4482cc
1 Parent(s): 745aead

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +127 -5
  2. app.py +6 -0
  3. mse.py +119 -0
  4. requirements.txt +4 -0
README.md CHANGED
@@ -1,12 +1,134 @@
1
  ---
2
- title: Mse
3
- emoji: 📚
4
- colorFrom: gray
5
- colorTo: gray
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MSE
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for MSE
16
+
17
+
18
+ ## Metric Description
19
+
20
+ Mean Squared Error(MSE) represents the average of the squares of errors -- i.e. the average squared difference between the estimated values and the actual values.
21
+
22
+ ![image](https://user-images.githubusercontent.com/14205986/165999302-eba3702d-81e3-4363-9c0e-d3bfceb7ec5a.png)
23
+
24
+ ## How to Use
25
+
26
+ At minimum, this metric requires predictions and references as inputs.
27
+
28
+ ```python
29
+ >>> mse_metric = evaluate.load("mse")
30
+ >>> predictions = [2.5, 0.0, 2, 8]
31
+ >>> references = [3, -0.5, 2, 7]
32
+ >>> results = mse_metric.compute(predictions=predictions, references=references)
33
+ ```
34
+
35
+ ### Inputs
36
+
37
+ Mandatory inputs:
38
+ - `predictions`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the estimated target values.
39
+ - `references`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the ground truth (correct) target values.
40
+
41
+ Optional arguments:
42
+ - `sample_weight`: numeric array-like of shape (`n_samples,`) representing sample weights. The default is `None`.
43
+ - `multioutput`: `raw_values`, `uniform_average` or numeric array-like of shape (`n_outputs,`), which defines the aggregation of multiple output values. The default value is `uniform_average`.
44
+ - `raw_values` returns a full set of errors in case of multioutput input.
45
+ - `uniform_average` means that the errors of all outputs are averaged with uniform weight.
46
+ - the array-like value defines weights used to average errors.
47
+ - `squared` (`bool`): If `True` returns MSE value, if `False` returns RMSE (Root Mean Squared Error). The default value is `True`.
48
+
49
+
50
+ ### Output Values
51
+ This metric outputs a dictionary, containing the mean squared error score, which is of type:
52
+ - `float`: if multioutput is `uniform_average` or an ndarray of weights, then the weighted average of all output errors is returned.
53
+ - numeric array-like of shape (`n_outputs,`): if multioutput is `raw_values`, then the score is returned for each output separately.
54
+
55
+ Each MSE `float` value ranges from `0.0` to `1.0`, with the best value being `0.0`.
56
+
57
+ Output Example(s):
58
+ ```python
59
+ {'mse': 0.5}
60
+ ```
61
+
62
+ If `multioutput="raw_values"`:
63
+ ```python
64
+ {'mse': array([0.41666667, 1. ])}
65
+ ```
66
+
67
+ #### Values from Popular Papers
68
+
69
+
70
+ ### Examples
71
+
72
+ Example with the `uniform_average` config:
73
+ ```python
74
+ >>> mse_metric = evaluate.load("mse")
75
+ >>> predictions = [2.5, 0.0, 2, 8]
76
+ >>> references = [3, -0.5, 2, 7]
77
+ >>> results = mse_metric.compute(predictions=predictions, references=references)
78
+ >>> print(results)
79
+ {'mse': 0.375}
80
+ ```
81
+
82
+ Example with `squared = True`, which returns the RMSE:
83
+ ```python
84
+ >>> mse_metric = evaluate.load("mse")
85
+ >>> predictions = [2.5, 0.0, 2, 8]
86
+ >>> references = [3, -0.5, 2, 7]
87
+ >>> rmse_result = mse_metric.compute(predictions=predictions, references=references, squared=False)
88
+ >>> print(rmse_result)
89
+ {'mse': 0.6123724356957945}
90
+ ```
91
+
92
+ Example with multi-dimensional lists, and the `raw_values` config:
93
+ ```python
94
+ >>> mse_metric = evaluate.load("mse", "multilist")
95
+ >>> predictions = [[0.5, 1], [-1, 1], [7, -6]]
96
+ >>> references = [[0, 2], [-1, 2], [8, -5]]
97
+ >>> results = mse_metric.compute(predictions=predictions, references=references, multioutput='raw_values')
98
+ >>> print(results)
99
+ {'mse': array([0.41666667, 1. ])}
100
+ """
101
+ ```
102
+
103
+ ## Limitations and Bias
104
+ MSE has the disadvantage of heavily weighting outliers -- given that it squares them, this results in large errors weighing more heavily than small ones. It can be used alongside [MAE](https://huggingface.co/metrics/mae), which is complementary given that it does not square the errors.
105
+
106
+ ## Citation(s)
107
+ ```bibtex
108
+ @article{scikit-learn,
109
+ title={Scikit-learn: Machine Learning in {P}ython},
110
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
111
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
112
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
113
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
114
+ journal={Journal of Machine Learning Research},
115
+ volume={12},
116
+ pages={2825--2830},
117
+ year={2011}
118
+ }
119
+ ```
120
+
121
+ ```bibtex
122
+ @article{willmott2005advantages,
123
+ title={Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance},
124
+ author={Willmott, Cort J and Matsuura, Kenji},
125
+ journal={Climate research},
126
+ volume={30},
127
+ number={1},
128
+ pages={79--82},
129
+ year={2005}
130
+ }
131
+ ```
132
+
133
+ ## Further References
134
+ - [Mean Squared Error - Wikipedia](https://en.wikipedia.org/wiki/Mean_squared_error)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("mse")
6
+ launch_gradio_widget(module)
mse.py ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """MSE - Mean Squared Error Metric"""
15
+
16
+ import datasets
17
+ from sklearn.metrics import mean_squared_error
18
+
19
+ import evaluate
20
+
21
+
22
+ _CITATION = """\
23
+ @article{scikit-learn,
24
+ title={Scikit-learn: Machine Learning in {P}ython},
25
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
26
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
27
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
28
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
29
+ journal={Journal of Machine Learning Research},
30
+ volume={12},
31
+ pages={2825--2830},
32
+ year={2011}
33
+ }
34
+ """
35
+
36
+ _DESCRIPTION = """\
37
+ Mean Squared Error(MSE) is the average of the square of difference between the predicted
38
+ and actual values.
39
+ """
40
+
41
+
42
+ _KWARGS_DESCRIPTION = """
43
+ Args:
44
+ predictions: array-like of shape (n_samples,) or (n_samples, n_outputs)
45
+ Estimated target values.
46
+ references: array-like of shape (n_samples,) or (n_samples, n_outputs)
47
+ Ground truth (correct) target values.
48
+ sample_weight: array-like of shape (n_samples,), default=None
49
+ Sample weights.
50
+ multioutput: {"raw_values", "uniform_average"} or array-like of shape (n_outputs,), default="uniform_average"
51
+ Defines aggregating of multiple output values. Array-like value defines weights used to average errors.
52
+
53
+ "raw_values" : Returns a full set of errors in case of multioutput input.
54
+
55
+ "uniform_average" : Errors of all outputs are averaged with uniform weight.
56
+
57
+ squared : bool, default=True
58
+ If True returns MSE value, if False returns RMSE (Root Mean Squared Error) value.
59
+
60
+ Returns:
61
+ mse : mean squared error.
62
+ Examples:
63
+
64
+ >>> mse_metric = evaluate.load("mse")
65
+ >>> predictions = [2.5, 0.0, 2, 8]
66
+ >>> references = [3, -0.5, 2, 7]
67
+ >>> results = mse_metric.compute(predictions=predictions, references=references)
68
+ >>> print(results)
69
+ {'mse': 0.375}
70
+ >>> rmse_result = mse_metric.compute(predictions=predictions, references=references, squared=False)
71
+ >>> print(rmse_result)
72
+ {'mse': 0.6123724356957945}
73
+
74
+ If you're using multi-dimensional lists, then set the config as follows :
75
+
76
+ >>> mse_metric = evaluate.load("mse", "multilist")
77
+ >>> predictions = [[0.5, 1], [-1, 1], [7, -6]]
78
+ >>> references = [[0, 2], [-1, 2], [8, -5]]
79
+ >>> results = mse_metric.compute(predictions=predictions, references=references)
80
+ >>> print(results)
81
+ {'mse': 0.7083333333333334}
82
+ >>> results = mse_metric.compute(predictions=predictions, references=references, multioutput='raw_values')
83
+ >>> print(results) # doctest: +NORMALIZE_WHITESPACE
84
+ {'mse': array([0.41666667, 1. ])}
85
+ """
86
+
87
+
88
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
89
+ class Mse(evaluate.EvaluationModule):
90
+ def _info(self):
91
+ return evaluate.EvaluationModuleInfo(
92
+ description=_DESCRIPTION,
93
+ citation=_CITATION,
94
+ inputs_description=_KWARGS_DESCRIPTION,
95
+ features=datasets.Features(self._get_feature_types()),
96
+ reference_urls=[
97
+ "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html"
98
+ ],
99
+ )
100
+
101
+ def _get_feature_types(self):
102
+ if self.config_name == "multilist":
103
+ return {
104
+ "predictions": datasets.Sequence(datasets.Value("float")),
105
+ "references": datasets.Sequence(datasets.Value("float")),
106
+ }
107
+ else:
108
+ return {
109
+ "predictions": datasets.Value("float"),
110
+ "references": datasets.Value("float"),
111
+ }
112
+
113
+ def _compute(self, predictions, references, sample_weight=None, multioutput="uniform_average", squared=True):
114
+
115
+ mse = mean_squared_error(
116
+ references, predictions, sample_weight=sample_weight, multioutput=multioutput, squared=squared
117
+ )
118
+
119
+ return {"mse": mse}
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ datasets~=2.0
4
+ sklearn