khuangaf commited on
Commit
1821682
1 Parent(s): c37ad4d

update readme

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: en
4
+ ---
5
+
6
+ # ChartVE (Chart Visual Entailment)
7
+
8
+ ChartVE is a visual entailment model introduced in the paper "Do LVLMs Understand Charts?
9
+ Analyzing and Correcting Factual Errors in Chart Captioning" for evaluating the factuality of a generated caption sentence with regard to the input chart. The model takes in a chart figure and a caption sentence as input, and outputs an entailment probability. To compute the the entailment probability, please refer to the "How to use" section below. The underlying architecture of this model is UniChart.
10
+
11
+
12
+ ### How to use
13
+
14
+ Using the pre-trained model directly:
15
+ ```python
16
+ from transformers import DonutProcessor, VisionEncoderDecoderModel
17
+ from PIL import Image
18
+
19
+ model_name = "khhuang/chartve"
20
+ model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
21
+ processor = DonutProcessor.from_pretrained(model_name)
22
+
23
+ image_path = "PATH_TO_IMAGE"
24
+
25
+ def format_query(sentence):
26
+ return f"Does the image entails this statement: \"{sentence}\"?"
27
+
28
+ # Format text inputs
29
+ CAPTION_SENTENCE = "The state that has the highest number of population is California."
30
+ query = format_query(CAPTION_SENTENCE)
31
+
32
+ # Encode chart figure and tokenize text
33
+ img = Image.open(IMAGE_PATH)
34
+ pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
35
+ pixel_values = pixel_values.cuda()
36
+ decoder_input_ids = processor.tokenizer(query, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()#.squeeze(0)
37
+
38
+
39
+ outputs = model(pixel_values, decoder_input_ids=decoder_input_ids)
40
+
41
+ # positive_logit = outputs['logits'].squeeze()[-1,49922]
42
+ # negative_logit = outputs['logits'].squeeze()[-1,2334]
43
+
44
+ # Probe the probability of generating "yes"
45
+ binary_entail_prob_positive = torch.nn.functional.softmax(outputs['logits'].squeeze()[-1,[2334, 49922]])[1].item()
46
+
47
+ # binary_entail_prob_positive corresponds to the computed probability that the chart entails the caption sentence.
48
+ ```
49
+
50
+ ### Citation
51
+ ```
52
+ @inproceedings{huang-etal-2023-do,
53
+ title = "Zero-shot Faithful Factual Error Correction",
54
+ author = "Huang, Kung-Hsiang and
55
+ Zhou, Mingyang and
56
+ Chan, Hou Pong and
57
+ Fung, Yi R. and
58
+ Wang, Zhenhailong and
59
+ Zhang, Lingyu and
60
+ Chang, Shih-Fu and
61
+ Ji, Heng",
62
+ year={2023},
63
+ archivePrefix={arXiv},
64
+ primaryClass={cs.CL}
65
+ ```
66
+ }