Factual Consistency Evaluator/Metric in ACL 2023 paper
WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Open-sourced code: https://github.com/nightdessert/WeCheck
Model description
WeCheck is a factual consistency metric trained from weakly annotated samples.
This WeCheck checkpoint can be used to check the following three generation tasks:
Text Summarization/Knowlege grounded dialogue Generation/Paraphrase
This WeCheck checkpoint is trained based on the following three weak labler:
QAFactEval / Summarc / NLI warmup
How to use the model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model_name = "nightdessert/WeCheck"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing." # Input for Summarization/ Dialogue / Paraphrase
hypothesis = "The movie was not good." # Output for Summarization/ Dialogue / Paraphrase
input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt", truncation_strategy="only_first", max_length=512)
output = model(input["input_ids"].to(device))['logits'][:,0] # device = "cuda:0" or "cpu"
prediction = torch.sigmoid(output).tolist()
print(prediction) #0.884
or apply for a batch of samples
premise = ["I first thought that I liked the movie, but upon second thought it was actually disappointing."]*3 # Input list for Summarization/ Dialogue / Paraphrase
hypothesis = ["The movie was not good."]*3 # Output list for Summarization/ Dialogue / Paraphrase
batch_tokens = tokenizer.batch_encode_plus(list(zip(premise, hypothesis)), padding=True,
truncation=True, max_length=512, return_tensors="pt", truncation_strategy="only_first")
output = model(batch_tokens["input_ids"].to(device))['logits'][:,0] # device = "cuda:0" or "cpu"
prediction = torch.sigmoid(output).tolist()
print(prediction) #[0.884,0.884,0.884]
license: openrail pipeline_tag: text-classification tags: - Factual Consistency - Natrual Language Inference
language:
- en tags:
- Factual Consistency Evaluation
- Downloads last month
- 213
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.