Back to all metrics
Dataset: bertscore 📉
Update on GitHub

How to load this metric directly with the 🤗/nlp library:

			
Copy to clipboard
from nlp import load_metric metric = load_metric("bertscore")

Description

BERTScore leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation. Moreover, BERTScore computes precision, recall, and F1 measure, which can be useful for evaluating different language generation tasks. See the [README.md] file at https://github.com/Tiiiger/bert_score for more information.

Citation

@inproceedings{bert-score,
  title={BERTScore: Evaluating Text Generation with BERT},
  author={Tianyi Zhang* and Varsha Kishore* and Felix Wu* and Kilian Q. Weinberger and Yoav Artzi},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=SkeHuCVFDr}
}