Model Card for Model NegBLEURT

This model is a negation-aware version of the BLEURT metric for evaluation of generated text.

Direct Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "tum-nlp/NegBLEURT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

references = ["Ray Charles is legendary.", "Ray Charles is legendary."]
candidates = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."]

tokenized = tokenizer(references, candidates, return_tensors='pt', padding=True)
print(model(**tokenized).logits)
# returns scores 0.8409 and 0.2601 for the two candidates

Use with pipeline

from transformers import pipeline

pipe = pipeline("text-classification", model="tum-nlp/NegBLEURT", function_to_apply="none") # set function_to_apply="none" for regression output!
pairwise_input = [
  [["Ray Charles is legendary.", "Ray Charles is a legend."]],
  [["Ray Charles is legendary.", "Ray Charles isn’t legendary."]]
]
print(pipe(pairwise_input))
# returns [{'label': 'NegBLEURT_score', 'score': 0.8408917784690857}, {'label': 'NegBLEURT_score', 'score': 0.26007288694381714}]

Training Details

The model is a fine-tuned version of the bleurt-tiny checkpoint from the official BLUERT repository. It was fine-tuned on the CANNOT dataset's train split for 500 steps using the fine-tuning script provided by BLEURT.

Citation

Please cite our INLG 2023 paper, if you use our model. BibTeX:

@misc{anschütz2023correct,
      title={This is not correct! Negation-aware Evaluation of Language Generation Systems}, 
      author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh},
      year={2023},
      eprint={2307.13989},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
170
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train tum-nlp/NegBLEURT

Space using tum-nlp/NegBLEURT 1