Results on PlotQA-V1 does not match with the paper

#1
by AliFarahani - opened

Hello,

I have tested the model on test set of PlotQA-V1 with 1.2m Q/A pairs. It appears that there is a discrepancy when compared to the results presented in the original paper. Is there any preprocessing involved in experiments in the original paper?

I have used these models in my experiments and achieved only ~40% accuracy:
processor = Pix2StructProcessor.from_pretrained('google/matcha-base')
model = Pix2StructForConditionalGeneration.from_pretrained('google/matcha-plotqa-v1')

Some example outputs:
idx, ground truth answer, generated answer

700000,bottom right,top right
700001,3,4
700002,vertical,vertical
700003,Number of mobile cellular subscribers,Net bilateral aid flow in an economy from Canada
700004,No,Yes
700005,Years,Years
700006,No. of subscribers (per 100 people),Aid flow (current US$)
700007,3,4
700008,No,No

I also meet this question. But I apply the evaluation metrics in the paper, and I achieved 80% score. However, the score
is also not match with the paper(90% in the paper). I want to know the reason.

Sign up or log in to comment