Spaces:
Running
title: Exact Match
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
Returns the rate at which the input predicted strings exactly match their
references, ignoring any strings input as part of the regexes_to_ignore list.
Metric Card for Exact Match
Metric Description
A given predicted string's exact match score is 1 if it is the exact same as its reference string, and is 0 otherwise.
- Example 1: The exact match score of prediction "Happy Birthday!" is 0, given its reference is "Happy New Year!".
- Example 2: The exact match score of prediction "The Colour of Magic (1983)" is 1, given its reference is also "The Colour of Magic (1983)".
The exact match score of a set of predictions is the sum of all of the individual exact match scores in the set, divided by the total number of predictions in the set.
- Example: The exact match score of the set {Example 1, Example 2} (above) is 0.5.
How to Use
At minimum, this metric takes as input predictions and references:
>>> from evaluate import load
>>> exact_match_metric = load("exact_match")
>>> results = exact_match_metric.compute(predictions=predictions, references=references)
Inputs
predictions
(list
ofstr
): List of predicted texts.references
(list
ofstr
): List of reference texts.regexes_to_ignore
(list
ofstr
): Regex expressions of characters to ignore when calculating the exact matches. Defaults toNone
. Note: the regex changes are applied before capitalization is normalized.ignore_case
(bool
): IfTrue
, turns everything to lowercase so that capitalization differences are ignored. Defaults toFalse
.ignore_punctuation
(bool
): IfTrue
, removes punctuation before comparing strings. Defaults toFalse
.ignore_numbers
(bool
): IfTrue
, removes all digits before comparing strings. Defaults toFalse
.
Output Values
This metric outputs a dictionary with one value: the average exact match score.
{'exact_match': 1.0}
This metric's range is 0-1, inclusive. Here, 0.0 means no prediction/reference pairs were matches, while 1.0 means they all were.
Values from Popular Papers
The exact match metric is often included in other metrics, such as SQuAD. For example, the original SQuAD paper reported an Exact Match score of 40.0%. They also report that the human performance Exact Match score on the dataset was 80.3%.
Examples
Without including any regexes to ignore:
>>> exact_match = evaluate.load("exact_match")
>>> refs = ["the cat", "theater", "YELLING", "agent007"]
>>> preds = ["cat?", "theater", "yelling", "agent"]
>>> results = exact_match.compute(references=refs, predictions=preds)
>>> print(round(results["exact_match"], 2))
0.25
Ignoring regexes "the" and "yell", as well as ignoring case and punctuation:
>>> exact_match = evaluate.load("exact_match")
>>> refs = ["the cat", "theater", "YELLING", "agent007"]
>>> preds = ["cat?", "theater", "yelling", "agent"]
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell"], ignore_case=True, ignore_punctuation=True)
>>> print(round(results["exact_match"], 2))
0.5
Note that in the example above, because the regexes are ignored before the case is normalized, "yell" from "YELLING" is not deleted.
Ignoring "the", "yell", and "YELL", as well as ignoring case and punctuation:
>>> exact_match = evaluate.load("exact_match")
>>> refs = ["the cat", "theater", "YELLING", "agent007"]
>>> preds = ["cat?", "theater", "yelling", "agent"]
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True)
>>> print(round(results["exact_match"], 2))
0.75
Ignoring "the", "yell", and "YELL", as well as ignoring case, punctuation, and numbers:
>>> exact_match = evaluate.load("exact_match")
>>> refs = ["the cat", "theater", "YELLING", "agent007"]
>>> preds = ["cat?", "theater", "yelling", "agent"]
>>> results = exact_match.compute(references=refs, predictions=preds, regexes_to_ignore=["the ", "yell", "YELL"], ignore_case=True, ignore_punctuation=True, ignore_numbers=True)
>>> print(round(results["exact_match"], 2))
1.0
An example that includes sentences:
>>> exact_match = evaluate.load("exact_match")
>>> refs = ["The cat sat on the mat.", "Theaters are great.", "It's like comparing oranges and apples."]
>>> preds = ["The cat sat on the mat?", "Theaters are great.", "It's like comparing apples and oranges."]
>>> results = exact_match.compute(references=refs, predictions=preds)
>>> print(round(results["exact_match"], 2))
0.33
Limitations and Bias
This metric is limited in that it outputs the same score for something that is completely wrong as for something that is correct except for a single character. In other words, there is no award for being almost right.
Citation
Further References
- Also used in the SQuAD metric