|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- transformers |
|
- negation |
|
- evaluation |
|
- metric |
|
datasets: |
|
- tum-nlp/cannot-dataset |
|
--- |
|
# Model Card for Model NegBLEURT |
|
|
|
This model is a negation-aware version of the BLEURT metric for evaluation of generated text. |
|
|
|
### Direct Use |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
model_name = "tum-nlp/NegBLEURT" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
references = ["Ray Charles is legendary.", "Ray Charles is legendary."] |
|
candidates = ["Ray Charles is a legend.", "Ray Charles isn’t legendary."] |
|
|
|
tokenized = tokenizer(references, candidates, return_tensors='pt', padding=True) |
|
print(model(**tokenized).logits) |
|
# returns scores 0.8409 and 0.2601 for the two candidates |
|
``` |
|
|
|
### Use with pipeline |
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-classification", model="tum-nlp/NegBLEURT", function_to_apply="none") # set function_to_apply="none" for regression output! |
|
pairwise_input = [ |
|
[["Ray Charles is legendary.", "Ray Charles is a legend."]], |
|
[["Ray Charles is legendary.", "Ray Charles isn’t legendary."]] |
|
] |
|
print(pipe(pairwise_input)) |
|
# returns [{'label': 'NegBLEURT_score', 'score': 0.8408917784690857}, {'label': 'NegBLEURT_score', 'score': 0.26007288694381714}] |
|
``` |
|
|
|
## Training Details |
|
|
|
The model is a fine-tuned version of the [bleurt-tiny](https://github.com/google-research/bleurt/tree/master/bleurt/test_checkpoint) checkpoint from the official BLUERT repository. |
|
It was fine-tuned on the CANNOT dataset's train split for 500 steps using the [fine-tuning script](https://github.com/google-research/bleurt/blob/master/bleurt/finetune.py) provided by BLEURT. |
|
|
|
|
|
|
|
## Citation |
|
|
|
Please cite our [INLG 2023 paper](https://arxiv.org/abs/2307.13989), if you use our model. |
|
**BibTeX:** |
|
```bibtex |
|
@misc{anschütz2023correct, |
|
title={This is not correct! Negation-aware Evaluation of Language Generation Systems}, |
|
author={Miriam Anschütz and Diego Miguel Lozano and Georg Groh}, |
|
year={2023}, |
|
eprint={2307.13989}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |