How to try VQA

#4
by seemorebricks - opened

Hello,

I see this from your paper in regards to implementing VQA and/or captioning:

'''
We utilize the METER (Dou et al., 2022) framework to facilitate our experiments on visual question answering (VQA). It formulates the VQA task as a classification task. The core module of METER is a transformer-based co-attention multimodal fusion module that produces cross-modal representations over the image and text encodings, which are then fed to a classifier for predicting the final answer.
'''

Is there some source code for this task that is available? Apologies, I'm new to the field so it's not quite intuitive to me.

Microsoft org

Please refer to the METER package at https://github.com/zdou0830/METER.

shengz changed discussion status to closed

Sign up or log in to comment