lvwerra HF staff commited on
Commit
241fc76
1 Parent(s): 705cd5c

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +110 -5
  2. app.py +6 -0
  3. exact_match.py +137 -0
  4. requirements.txt +3 -0
README.md CHANGED
@@ -1,12 +1,117 @@
1
  ---
2
- title: Exact_match
3
- emoji: 🌍
4
- colorFrom: gray
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Exact Match
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for Exact Match
16
+
17
+
18
+ ## Metric Description
19
+ A given predicted string's exact match score is 1 if it is the exact same as its reference string, and is 0 otherwise.
20
+
21
+ - **Example 1**: The exact match score of prediction "Happy Birthday!" is 0, given its reference is "Happy New Year!".
22
+ - **Example 2**: The exact match score of prediction "The Colour of Magic (1983)" is 1, given its reference is also "The Colour of Magic (1983)".
23
+
24
+ The exact match score of a set of predictions is the sum of all of the individual exact match scores in the set, divided by the total number of predictions in the set.
25
+
26
+ - **Example**: The exact match score of the set {Example 1, Example 2} (above) is 0.5.
27
+
28
+
29
+ ## How to Use
30
+ At minimum, this metric takes as input predictions and references:
31
+ ```python
32
+ >>> from datasets import load
33
+ >>> exact_match_metric = load("exact_match")
34
+ >>> results = exact_match_metric.compute(predictions=predictions, references=references)
35
+ ```
36
+
37
+ ### Inputs
38
+ - **`predictions`** (`list` of `str`): List of predicted texts.
39
+ - **`references`** (`list` of `str`): List of reference texts.
40
+ - **`regexes_to_ignore`** (`list` of `str`): Regex expressions of characters to ignore when calculating the exact matches. Defaults to `None`. Note: the regex changes are applied before capitalization is normalized.
41
+ - **`ignore_case`** (`bool`): If `True`, turns everything to lowercase so that capitalization differences are ignored. Defaults to `False`.
42
+ - **`ignore_punctuation`** (`bool`): If `True`, removes punctuation before comparing strings. Defaults to `False`.
43
+ - **`ignore_numbers`** (`bool`): If `True`, removes all digits before comparing strings. Defaults to `False`.
44
+
45
+
46
+ ### Output Values
47
+ This metric outputs a dictionary with one value: the average exact match score.
48
+
49
+ ```python
50
+ {'exact_match': 100.0}
51
+ ```
52
+
53
+ This metric's range is 0-100, inclusive. Here, 0.0 means no prediction/reference pairs were matches, while 100.0 means they all were.
54
+
55
+ #### Values from Popular Papers
56
+ The exact match metric is often included in other metrics, such as SQuAD. For example, the [original SQuAD paper](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf) reported an Exact Match score of 40.0%. They also report that the human performance Exact Match score on the dataset was 80.3%.
57
+
58
+ ### Examples
59
+ Without including any regexes to ignore:
60
+ ```python
61
+ >>> exact_match = evaluate.load("exact_match")
62
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
63
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
64
+ >>> results = exact_match.compute(references=refs, predictions=preds)
65
+ >>> print(round(results["exact_match"], 1))
66
+ 25.0
67
+ ```
68
+
69
+ Ignoring regexes "the" and "yell", as well as ignoring case and punctuation:
70
+ ```python
71
+ >>> exact_match = evaluate.load("exact_match")
72
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
73
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
74
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell"], ignore_case=True, ignore_punctuation=True)
75
+ >>> print(round(results["exact_match"], 1))
76
+ 50.0
77
+ ```
78
+ Note that in the example above, because the regexes are ignored before the case is normalized, "yell" from "YELLING" is not deleted.
79
+
80
+ Ignoring "the", "yell", and "YELL", as well as ignoring case and punctuation:
81
+ ```python
82
+ >>> exact_match = evaluate.load("exact_match")
83
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
84
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
85
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True)
86
+ >>> print(round(results["exact_match"], 1))
87
+ 75.0
88
+ ```
89
+
90
+ Ignoring "the", "yell", and "YELL", as well as ignoring case, punctuation, and numbers:
91
+ ```python
92
+ >>> exact_match = evaluate.load("exact_match")
93
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
94
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
95
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True, ignore_numbers=True)
96
+ >>> print(round(results["exact_match"], 1))
97
+ 100.0
98
+ ```
99
+
100
+ An example that includes sentences:
101
+ ```python
102
+ >>> exact_match = evaluate.load("exact_match")
103
+ >>> refs = ["The cat sat on the mat.", "Theaters are great.", "It's like comparing oranges and apples."]
104
+ >>> preds = ["The cat sat on the mat?", "Theaters are great.", "It's like comparing apples and oranges."]
105
+ >>> results = exact_match.compute(references=refs, predictions=preds)
106
+ >>> print(round(results["exact_match"], 1))
107
+ 33.3
108
+ ```
109
+
110
+
111
+ ## Limitations and Bias
112
+ This metric is limited in that it outputs the same score for something that is completely wrong as for something that is correct except for a single character. In other words, there is no award for being *almost* right.
113
+
114
+ ## Citation
115
+
116
+ ## Further References
117
+ - Also used in the [SQuAD metric](https://github.com/huggingface/datasets/tree/master/metrics/squad)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("exact_match")
6
+ launch_gradio_widget(module)
exact_match.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Exact Match metric."""
15
+ import re
16
+ import string
17
+
18
+ import datasets
19
+ import numpy as np
20
+
21
+ import evaluate
22
+
23
+
24
+ _DESCRIPTION = """
25
+ Returns the rate at which the input predicted strings exactly match their references, ignoring any strings input as part of the regexes_to_ignore list.
26
+ """
27
+
28
+ _KWARGS_DESCRIPTION = """
29
+ Args:
30
+ predictions: List of predicted texts.
31
+ references: List of reference texts.
32
+ regexes_to_ignore: List, defaults to None. Regex expressions of characters to
33
+ ignore when calculating the exact matches. Note: these regexes are removed
34
+ from the input data before the changes based on the options below (e.g. ignore_case,
35
+ ignore_punctuation, ignore_numbers) are applied.
36
+ ignore_case: Boolean, defaults to False. If true, turns everything
37
+ to lowercase so that capitalization differences are ignored.
38
+ ignore_punctuation: Boolean, defaults to False. If true, removes all punctuation before
39
+ comparing predictions and references.
40
+ ignore_numbers: Boolean, defaults to False. If true, removes all punctuation before
41
+ comparing predictions and references.
42
+ Returns:
43
+ exact_match: Dictionary containing exact_match rate. Possible values are between 0.0 and 100.0, inclusive.
44
+ Examples:
45
+ >>> exact_match = evaluate.load("exact_match")
46
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
47
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
48
+ >>> results = exact_match.compute(references=refs, predictions=preds)
49
+ >>> print(round(results["exact_match"], 1))
50
+ 25.0
51
+
52
+ >>> exact_match = evaluate.load("exact_match")
53
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
54
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
55
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell"], ignore_case=True, ignore_punctuation=True)
56
+ >>> print(round(results["exact_match"], 1))
57
+ 50.0
58
+
59
+
60
+ >>> exact_match = evaluate.load("exact_match")
61
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
62
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
63
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True)
64
+ >>> print(round(results["exact_match"], 1))
65
+ 75.0
66
+
67
+ >>> exact_match = evaluate.load("exact_match")
68
+ >>> refs = ["the cat", "theater", "YELLING", "agent007"]
69
+ >>> preds = ["cat?", "theater", "yelling", "agent"]
70
+ >>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True, ignore_numbers=True)
71
+ >>> print(round(results["exact_match"], 1))
72
+ 100.0
73
+
74
+ >>> exact_match = evaluate.load("exact_match")
75
+ >>> refs = ["The cat sat on the mat.", "Theaters are great.", "It's like comparing oranges and apples."]
76
+ >>> preds = ["The cat sat on the mat?", "Theaters are great.", "It's like comparing apples and oranges."]
77
+ >>> results = exact_match.compute(references=refs, predictions=preds)
78
+ >>> print(round(results["exact_match"], 1))
79
+ 33.3
80
+
81
+ """
82
+
83
+ _CITATION = """
84
+ """
85
+
86
+
87
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
88
+ class ExactMatch(evaluate.EvaluationModule):
89
+ def _info(self):
90
+ return evaluate.EvaluationModuleInfo(
91
+ description=_DESCRIPTION,
92
+ citation=_CITATION,
93
+ inputs_description=_KWARGS_DESCRIPTION,
94
+ features=datasets.Features(
95
+ {
96
+ "predictions": datasets.Value("string", id="sequence"),
97
+ "references": datasets.Value("string", id="sequence"),
98
+ }
99
+ ),
100
+ reference_urls=[],
101
+ )
102
+
103
+ def _compute(
104
+ self,
105
+ predictions,
106
+ references,
107
+ regexes_to_ignore=None,
108
+ ignore_case=False,
109
+ ignore_punctuation=False,
110
+ ignore_numbers=False,
111
+ ):
112
+
113
+ if regexes_to_ignore is not None:
114
+ for s in regexes_to_ignore:
115
+ predictions = np.array([re.sub(s, "", x) for x in predictions])
116
+ references = np.array([re.sub(s, "", x) for x in references])
117
+ else:
118
+ predictions = np.asarray(predictions)
119
+ references = np.asarray(references)
120
+
121
+ if ignore_case:
122
+ predictions = np.char.lower(predictions)
123
+ references = np.char.lower(references)
124
+
125
+ if ignore_punctuation:
126
+ repl_table = string.punctuation.maketrans("", "", string.punctuation)
127
+ predictions = np.char.translate(predictions, table=repl_table)
128
+ references = np.char.translate(references, table=repl_table)
129
+
130
+ if ignore_numbers:
131
+ repl_table = string.digits.maketrans("", "", string.digits)
132
+ predictions = np.char.translate(predictions, table=repl_table)
133
+ references = np.char.translate(references, table=repl_table)
134
+
135
+ score_list = predictions == references
136
+
137
+ return {"exact_match": np.mean(score_list) * 100}
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ datasets~=2.0