Back to all metrics
Metric: bleurt πŸ“‰
Update on GitHub

How to load this metric directly with the πŸ€—/datasets library:

Copy to clipboard
from datasets import load_metric metric = load_metric("bleurt")


BLEURT a learnt evaluation metric for Natural Language Generation. It is built using multiple phases of transfer learning starting from a pretrained BERT model (Devlin et al. 2018) and then employing another pre-training phrase using synthetic data. Finally it is trained on WMT human annotations. You may run BLEURT out-of-the-box or fine-tune it for your specific application (the latter is expected to perform better). See the [] file at for more information.


  title={BLEURT: Learning Robust Metrics for Text Generation},
  author={Thibault Sellam and Dipanjan Das and Ankur P. Parikh},