Metric: gleu

#### Description

The GLEU metric is a variant of BLEU proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). GLEU hews more closely to human judgments than the rankings produced by metrics such as MaxMatch and I-measure. The present metric is the second version of GLEU (Napoles et al., 2016) modified to address problems that arise when using an increasing number of reference sets. The modified metric does not require tuning and is recommended to be used instead of the original version.

How to load this metric directly with the datasets library:

from datasets import load_metric
metric = load_metric("gleu")

#### Citation

@InProceedings{napoles-EtAl:2015:ACL-IJCNLP,
author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Post, Matt  and  Tetreault, Joel},
title     = {Ground Truth for Grammatical Error Correction Metrics},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month     = {July},
year      = {2015},
address   = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages     = {588--593},
url       = {http://www.aclweb.org/anthology/P15-2097}
}
@Article{napoles2016gleu,
author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Post, Matt  and  Tetreault, Joel},
title     = {{GLEU} Without Tuning},
journal   = {eprint arXiv:1605.02592 [cs.CL]},
year      = {2016},
url       = {http://arxiv.org/abs/1605.02592}
}