mfumanelli commited on
Commit
4560eb1
1 Parent(s): bdd162d

Updating module

Browse files
.gitattributes DELETED
@@ -1,27 +0,0 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ftz filter=lfs diff=lfs merge=lfs -text
6
- *.gz filter=lfs diff=lfs merge=lfs -text
7
- *.h5 filter=lfs diff=lfs merge=lfs -text
8
- *.joblib filter=lfs diff=lfs merge=lfs -text
9
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
- *.model filter=lfs diff=lfs merge=lfs -text
11
- *.msgpack filter=lfs diff=lfs merge=lfs -text
12
- *.onnx filter=lfs diff=lfs merge=lfs -text
13
- *.ot filter=lfs diff=lfs merge=lfs -text
14
- *.parquet filter=lfs diff=lfs merge=lfs -text
15
- *.pb filter=lfs diff=lfs merge=lfs -text
16
- *.pt filter=lfs diff=lfs merge=lfs -text
17
- *.pth filter=lfs diff=lfs merge=lfs -text
18
- *.rar filter=lfs diff=lfs merge=lfs -text
19
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
- *.tar.* filter=lfs diff=lfs merge=lfs -text
21
- *.tflite filter=lfs diff=lfs merge=lfs -text
22
- *.tgz filter=lfs diff=lfs merge=lfs -text
23
- *.wasm filter=lfs diff=lfs merge=lfs -text
24
- *.xz filter=lfs diff=lfs merge=lfs -text
25
- *.zip filter=lfs diff=lfs merge=lfs -text
26
- *.zstandard filter=lfs diff=lfs merge=lfs -text
27
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,50 +1,104 @@
1
  ---
2
  title: Geometric Mean
3
- datasets:
4
- -
5
- tags:
6
- - evaluate
7
- - metric
8
- description: "TODO: add a description here"
9
  sdk: gradio
10
  sdk_version: 3.0.2
11
  app_file: app.py
12
  pinned: false
 
 
 
 
 
13
  ---
14
 
15
  # Metric Card for Geometric Mean
16
 
17
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
18
-
19
  ## Metric Description
20
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
 
 
21
 
22
  ## How to Use
23
- *Give general statement of how to use the metric*
24
 
25
- *Provide simplest possible example for using the metric*
 
 
 
 
 
 
 
26
 
27
  ### Inputs
28
- *List all input arguments in the format below*
29
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 
 
 
 
 
 
 
 
 
 
30
 
31
  ### Output Values
32
 
33
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
34
 
35
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 
 
 
36
 
37
- #### Values from Popular Papers
38
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
39
 
40
  ### Examples
41
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Limitations and Bias
44
  *Note any known limitations or biases that the metric has, with links and references if possible.*
45
 
46
- ## Citation
47
- *Cite the source where this metric was introduced.*
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## Further References
50
  *Add any useful further references.*
 
1
  ---
2
  title: Geometric Mean
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
 
 
 
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ The geometric mean (G-mean) is the root of the product of class-wise sensitivity.
15
  ---
16
 
17
  # Metric Card for Geometric Mean
18
 
 
 
19
  ## Metric Description
20
+
21
+ The geometric mean (G-mean) is the root of the product of class-wise sensitivity.
22
+ This measure tries to maximize the accuracy on each of the classes while keeping these accuracies balanced.
23
+ For binary classification G-mean is the squared root of the product of the sensitivity and specificity.
24
 
25
  ## How to Use
 
26
 
27
+ At minimum, this metric requires predictions and references as input
28
+
29
+ ```python
30
+ >>> gmean_metric = evaluate.load("geometric_mean")
31
+ >>> results = gmean_metric.compute(predictions=[0, 1], references=[0, 1])
32
+ >>> print(results)
33
+ ["{'geometric-mean': 1.0}"]
34
+ ```
35
 
36
  ### Inputs
37
+ - **predictions** (`list` of `int`): Predicted labels.
38
+ - **references** (`list` of `int`): Ground truth labels.
39
+ - **labels** (`list` of `int`): The set of labels to include when average != 'binary', and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. Defaults to None.
40
+ - **pos_label** (`string` or `int`): The class to report if average='binary' and the data is binary. If the data are multiclass, this will be ignored; setting labels=[pos_label] and average != 'binary' will report scores for that label only. Defaults to 1.
41
+ - **average** (`string`): If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. Defaults to `'multiclass'`.
42
+ - 'binary': Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
43
+ - 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.
44
+ - 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
45
+ - 'weighted': Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label).
46
+ - 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
47
+ - **sample_weight** (`list` of `float`): Sample weights. Defaults to None.
48
+ - **correction** (`float`): Substitutes sensitivity of unrecognized classes from zero to a given value. Defaults to 0.0.
49
 
50
  ### Output Values
51
 
52
+ - **geometric_mean** (`float` or `array` of `float`): geometric mean score or list of geometric mean scores, depending on the value passed to `average`. Minimum possible value is 0. Maximum possible value is 1. Higher geometric mean scores are better.
53
 
54
+ Output Example:
55
+ ```python
56
+ {'geometric_mean': 0.4714045207910317}
57
+ ```
58
 
 
 
59
 
60
  ### Examples
61
+
62
+ Example 1-A simple binary example
63
+ ```python
64
+ >>> geometric_mean = evaluate.load("geometric_mean")
65
+ >>> results = geometric_mean.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0])
66
+ >>> print(round(res['geometric-mean'], 2))
67
+ 0.58
68
+ ```
69
+
70
+ Example 2-The same simple binary example as in Example 1, but with `sample_weight` included.
71
+ ```python
72
+ >>> geometric_mean = evaluate.load("geometric_mean")
73
+ >>> results = geometric_mean.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], sample_weight=[0.9, 0.5, 3.9, 1.2, 0.3])
74
+ >>> print(round(results['geometric-mean'], 2))
75
+ 0.35
76
+ ```
77
+
78
+ Example 3-A multiclass example, with `average` equal to `macro`.
79
+ ```python
80
+ >>> predictions = [0, 2, 1, 0, 0, 1]
81
+ >>> references = [0, 1, 2, 0, 1, 2]
82
+ >>> results = geometric_mean.compute(predictions=predictions, references=references, average="macro")
83
+ >>> print(round(results['geometric-mean'], 2))
84
+ 0.47
85
+ ```
86
 
87
  ## Limitations and Bias
88
  *Note any known limitations or biases that the metric has, with links and references if possible.*
89
 
90
+ ## Citation(s)
91
+ ```bibtex
92
+ @article{imbalanced-learn,
93
+ title={Imbalanced-learn: A Python Toolbox to Tackle the Curse of
94
+ Imbalanced Datasets in Machine Learning},
95
+ author={Lemaˆıtre, G. and Nogueira, F. and Aridas, C.},
96
+ journal={Journal of Machine Learning Research},
97
+ volume={18},
98
+ pages={1-5},
99
+ year={2017}
100
+ }
101
+ ```
102
 
103
  ## Further References
104
  *Add any useful further references.*
__pycache__/geometric_mean.cpython-310.pyc ADDED
Binary file (5.98 kB). View file
 
geometric_mean.py CHANGED
@@ -11,85 +11,97 @@
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
- """TODO: Add a description here."""
15
 
16
- import evaluate
17
  import datasets
 
 
18
 
19
-
20
- # TODO: Add BibTeX citation
21
- _CITATION = """\
22
- @InProceedings{huggingface:module,
23
- title = {A great new module},
24
- authors={huggingface, Inc.},
25
- year={2020}
26
- }
27
- """
28
-
29
- # TODO: Add description of the module here
30
- _DESCRIPTION = """\
31
- This new module is designed to solve this great ML task and is crafted with a lot of care.
32
  """
33
 
34
-
35
- # TODO: Add description of the arguments of the module here
36
  _KWARGS_DESCRIPTION = """
37
  Calculates how good are predictions given some references, using certain scores
38
  Args:
39
- predictions: list of predictions to score. Each predictions
40
- should be a string with tokens separated by spaces.
41
- references: list of reference for each prediction. Each
42
- reference should be a string with tokens separated by spaces.
 
 
 
 
 
 
 
 
 
 
 
43
  Returns:
44
- accuracy: description of the first score,
45
- another_score: description of the second score,
46
- Examples:
47
- Examples should be written in doctest format, and should illustrate how
48
- to use the function.
49
 
50
- >>> my_new_module = evaluate.load("my_new_module")
51
- >>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
52
- >>> print(results)
53
- {'accuracy': 1.0}
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  """
55
 
56
- # TODO: Define external resources urls if needed
57
- BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
 
 
 
 
 
 
 
 
 
58
 
59
 
60
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
61
  class GeometricMean(evaluate.Metric):
62
- """TODO: Short description of my evaluation module."""
63
-
64
  def _info(self):
65
- # TODO: Specifies the evaluate.EvaluationModuleInfo object
66
  return evaluate.MetricInfo(
67
- # This is the description that will appear on the modules page.
68
  module_type="metric",
69
  description=_DESCRIPTION,
70
  citation=_CITATION,
71
  inputs_description=_KWARGS_DESCRIPTION,
72
  # This defines the format of each prediction and reference
73
- features=datasets.Features({
74
- 'predictions': datasets.Value('int64'),
75
- 'references': datasets.Value('int64'),
76
- }),
77
- # Homepage of the module for documentation
78
- homepage="http://module.homepage",
79
- # Additional links to the codebase or references
80
- codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
81
- reference_urls=["http://path.to.reference.url/new_module"]
 
 
 
82
  )
83
 
84
- def _download_and_prepare(self, dl_manager):
85
- """Optional: download external resources useful to compute the scores"""
86
- # TODO: Download external resources if needed
87
- pass
88
-
89
- def _compute(self, predictions, references):
90
- """Returns the scores"""
91
- # TODO: Compute the different scores of the module
92
- accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
93
- return {
94
- "accuracy": accuracy,
95
- }
 
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
+ """Geometric mean metric."""
15
 
 
16
  import datasets
17
+ from imblearn.metrics import geometric_mean_score
18
+ import evaluate
19
 
20
+ _DESCRIPTION = """
21
+ The geometric mean (G-mean) is the root of the product of class-wise sensitivity. This measure
22
+ tries to maximize the accuracy on each of the classes while keeping these accuracies balanced. For binary
23
+ classification G-mean is the squared root of the product of the sensitivity and specificity. For multi-class problems
24
+ it is a higher root of the product of sensitivity for each class.
 
 
 
 
 
 
 
 
25
  """
26
 
 
 
27
  _KWARGS_DESCRIPTION = """
28
  Calculates how good are predictions given some references, using certain scores
29
  Args:
30
+ predictions (`list` of `int`): Predicted labels.
31
+ references (`list` of `int`): Ground truth labels.
32
+ labels (`list` of `int`): The set of labels to include when average != 'binary', and their order if average is None. Labels present in the data can be excluded, for example to calculate a multiclass average ignoring a majority negative class, while labels not present in the data will result in 0 components in a macro average. Defaults to None.
33
+ pos_label ('string' or `int`): The class to report if average='binary' and the data is binary. If the data are multiclass, this will be ignored; setting labels=[pos_label] and average != 'binary' will report scores for that label only. Defaults to 1.
34
+ average (`string`): If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data. Defaults to `'multiclass'`.
35
+
36
+ - 'binary': Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.
37
+ - 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.
38
+ - 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
39
+ - 'weighted': Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label).
40
+ - 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
41
+
42
+ sample_weight (`list` of `float`): Sample weights. Defaults to None.
43
+ correction (`float`): Substitutes sensitivity of unrecognized classes from zero to a given value. Defaults to 0.0.
44
+
45
  Returns:
46
+ geometric_mean (`float` or `array` of `float`): geometric mean score or list of geometric mean scores, depending on the value passed to `average`. Minimum possible value is 0. Maximum possible value is 1. Higher geometric mean scores are better.
 
 
 
 
47
 
48
+ Examples:
49
+ Example 1-A simple binary example
50
+ >>> geometric_mean = evaluate.load("geometric_mean")
51
+ >>> results = geometric_mean.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0])
52
+ >>> print(round(res['geometric-mean'], 2))
53
+ 0.58
54
+ Example 2-The same simple binary example as in Example 1, but with `sample_weight` included.
55
+ >>> geometric_mean = evaluate.load("geometric_mean")
56
+ >>> results = geometric_mean.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], sample_weight=[0.9, 0.5, 3.9, 1.2, 0.3])
57
+ >>> print(round(results['geometric-mean'], 2))
58
+ 0.35
59
+ Example 3-A multiclass example, with `average` equal to `macro`.
60
+ >>> predictions = [0, 2, 1, 0, 0, 1]
61
+ >>> references = [0, 1, 2, 0, 1, 2]
62
+ >>> results = geometric_mean.compute(predictions=predictions, references=references, average="macro")
63
+ >>> print(round(results['geometric-mean'], 2))
64
+ 0.47
65
  """
66
 
67
+ _CITATION = """
68
+ @article{imbalanced-learn,
69
+ title={Imbalanced-learn: A Python Toolbox to Tackle the Curse of
70
+ Imbalanced Datasets in Machine Learning},
71
+ author={Lemaˆıtre, G. and Nogueira, F. and Aridas, C.},
72
+ journal={Journal of Machine Learning Research},
73
+ volume={18},
74
+ pages={1-5},
75
+ year={2017}
76
+ }
77
+ """
78
 
79
 
80
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
81
  class GeometricMean(evaluate.Metric):
 
 
82
  def _info(self):
 
83
  return evaluate.MetricInfo(
 
84
  module_type="metric",
85
  description=_DESCRIPTION,
86
  citation=_CITATION,
87
  inputs_description=_KWARGS_DESCRIPTION,
88
  # This defines the format of each prediction and reference
89
+ features=datasets.Features(
90
+ {
91
+ "predictions": datasets.Sequence(datasets.Value("int32")),
92
+ "references": datasets.Sequence(datasets.Value("int32")),
93
+ }
94
+ if self.config_name == "multilabel"
95
+ else {
96
+ "predictions": datasets.Value("int32"),
97
+ "references": datasets.Value("int32"),
98
+ }
99
+ ),
100
+ reference_urls=["http://glemaitre.github.io/imbalanced-learn/generated/imblearn.metrics.geometric_mean_score.html#:~:text=The%20geometric%20mean%20(G%2Dmean,of%20the%20sensitivity%20and%20specificity."],
101
  )
102
 
103
+ def _compute(self, predictions, references, labels=None, pos_label=1, average="multiclass", sample_weight=None, correction=0.0):
104
+ score = geometric_mean_score(
105
+ references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight, correction=correction
106
+ )
107
+ return {"geometric-mean": float(score) if score.size == 1 else score}
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  git+https://github.com/huggingface/evaluate@a45df1eb9996eec64ec3282ebe554061cb366388
2
- datasets~=2.0
 
 
1
  git+https://github.com/huggingface/evaluate@a45df1eb9996eec64ec3282ebe554061cb366388
2
+ datasets~=2.0
3
+ imblearn==0.0
tests.py CHANGED
@@ -1,17 +1,19 @@
1
- test_cases = [
2
- {
3
- "predictions": [0, 0],
4
- "references": [1, 1],
5
- "result": {"metric_score": 0}
6
- },
7
- {
8
- "predictions": [1, 1],
9
- "references": [1, 1],
10
- "result": {"metric_score": 1}
11
- },
12
- {
13
- "predictions": [1, 0],
14
- "references": [1, 1],
15
- "result": {"metric_score": 0.5}
16
- }
17
- ]
 
 
 
1
+ import unittest
2
+
3
+ from metrics.geometric_mean.geometric_mean import GeometricMean
4
+
5
+ geometric_mean = GeometricMean()
6
+
7
+
8
+ class TestGeometricMean(unittest.TestCase):
9
+ def test_gmean(self):
10
+ refs = [0, 1, 2, 0, 1, 2]
11
+ preds = [0, 1, 2, 0, 1, 2]
12
+ geometric_mean_score = geometric_mean.compute(predictions=preds, references=refs)
13
+ print(geometric_mean_score)
14
+ self.assertTrue(geometric_mean_score == {'geometric-mean': 1.0})
15
+
16
+ refs = [0, 2, 1, 0, 0, 1]
17
+ preds = [0, 1, 2, 0, 1, 2]
18
+ geometric_mean_score = geometric_mean.compute(predictions=preds, references=refs)
19
+ self.assertTrue(geometric_mean_score == {'geometric-mean': 0.0})