Results on PlotQA-V1 does not match with the paper

by AliFarahani - opened


I have tested the model on test set of PlotQA-V1 with 1.2m Q/A pairs. It appears that there is a discrepancy when compared to the results presented in the original paper. Is there any preprocessing involved in experiments in the original paper?

I have used these models in my experiments and achieved only ~40% accuracy:
processor = Pix2StructProcessor.from_pretrained('google/matcha-base')
model = Pix2StructForConditionalGeneration.from_pretrained('google/matcha-plotqa-v1')

Some example outputs:
idx, ground truth answer, generated answer

700000,bottom right,top right
700003,Number of mobile cellular subscribers,Net bilateral aid flow in an economy from Canada
700006,No. of subscribers (per 100 people),Aid flow (current US$)

I also meet this question. But I apply the evaluation metrics in the paper, and I achieved 80% score. However, the score
is also not match with the paper(90% in the paper). I want to know the reason.

Sign up or log in to comment