Metric:
sacrebleu
Hosted in S3
Description
SacreBLEU provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. Inspired by Rico Sennrich's `multi-bleu-detok.perl`, it produces the official WMT scores but works with plain text. It also knows all the standard test sets and handles downloading, processing, and tokenization for you. See the [README.md] file at https://github.com/mjpost/sacreBLEU for more information.
How to load this metric directly with the
datasets
library:
from datasets import load_metric metric = load_metric("sacrebleu")
Citation
@inproceedings{post-2018-call, title = "A Call for Clarity in Reporting {BLEU} Scores", author = "Post, Matt", booktitle = "Proceedings of the Third Conference on Machine Translation: Research Papers", month = oct, year = "2018", address = "Belgium, Brussels", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W18-6319", pages = "186--191", }