metadata

title: Negbleurt
emoji: 🌖
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 3.38.0
app_file: app.py
pinned: false
license: mit

Metric Card for NegBLEURT

Metric Description

NegBLEURT is the negation-aware version of the BLEURT metric. It can be used to evaluate generated text against a reference.
BLEURT a learnt evaluation metric for Natural Language Generation. It is built using multiple phases of transfer learning starting from a pretrained BERT model (Devlin et al. 2018) and then employing another pre-training phrase using synthetic data. Finally it is trained on WMT human annotations and the CANNOT negation awareness dataset.

How to Use

At minimum, this metric requires predictions and references as inputs.

>>> negBLEURT = evaluate.load('tum-nlp/negbleurt')
>>> predictions = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."]
>>> references = ["Ray Charles is legendary.", "Ray Charles is legendary."]
>>> results = negBLEURT.compute(predictions=predictions, references=references)
>>> print(results)
    {'negBLERUT': [0.8409, 0.2601]}

Inputs

**predictions: list of predictions to score. Each prediction should be a string.
**references: list of references, one for each prediction. Each reference should be a string
**batch_size (optional): batch_size for model inference. Default is 16

Output Values

negBLEURT(list of float): NegBLEURT scores. Values usually range between 0 and 1 where 1 indicates a perfect prediction and 0 indicates a poor fit. Output Example(s):

{'negBLERUT': [0.8409, 0.2601]}

This metric outputs a dictionary, containing the negBLEURT score.

Limitations and Bias

This metric is based on BERT (Devlin et al. 2018) and as such inherits its biases and weaknesses. It was trained in an negation aware setting, and thus, overcomes BERT issues with negation awareness.

Currently, NegBLEURT is only available in English.

Citation

Please cite our INLG 2023 paper, if you use our metric. BibTeX:

@misc{anschütz2023correct,
      title={This is not correct! Negation-aware Evaluation of Language Generation Systems}, 
      author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh},
      year={2023},
      eprint={2307.13989},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Further References

The original NegBLEURT GitHub repo