Metric:
gleu
Hosted in S3
Description
The GLEU metric is a variant of BLEU proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). GLEU hews more closely to human judgments than the rankings produced by metrics such as MaxMatch and I-measure. The present metric is the second version of GLEU (Napoles et al., 2016) modified to address problems that arise when using an increasing number of reference sets. The modified metric does not require tuning and is recommended to be used instead of the original version.
How to load this metric directly with the
datasets
library:
from datasets import load_metric metric = load_metric("gleu")
Citation
@InProceedings{napoles-EtAl:2015:ACL-IJCNLP, author = {Napoles, Courtney and Sakaguchi, Keisuke and Post, Matt and Tetreault, Joel}, title = {Ground Truth for Grammatical Error Correction Metrics}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)}, month = {July}, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, pages = {588--593}, url = {http://www.aclweb.org/anthology/P15-2097} } @Article{napoles2016gleu, author = {Napoles, Courtney and Sakaguchi, Keisuke and Post, Matt and Tetreault, Joel}, title = {{GLEU} Without Tuning}, journal = {eprint arXiv:1605.02592 [cs.CL]}, year = {2016}, url = {http://arxiv.org/abs/1605.02592} }