Metric: bleurt

BLEURT a learnt evaluation metric for Natural Language Generation. It is built using multiple phases of transfer learning starting from a pretrained BERT model (Devlin et al. 2018) and then employing another pre-training phrase using synthetic data. Finally it is trained on WMT human annotations. You may run BLEURT out-of-the-box or fine-tune it for your specific application (the latter is expected to perform better). See the [README.md] file at https://github.com/google-research/bleurt for more information.

How to load this metric directly with the datasets library:

from datasets import load_metric

