Multilingual-VQA / sections /intro /intro_part_1.md
gchhablani's picture
Put FasterRCNN steps in Beta Expander
3bd4b4e
|
raw
history blame
No virus
958 Bytes

Visual Question Answering (VQA) is a task where we expect the AI to answer a question about a given image. VQA has been an active area of research for the past 4-5 years, with most datasets using natural images found online. Two examples of such datasets are: VQAv2, GQA. VQA is a particularly interesting multi-modal machine learning challenge because it has several interesting applications across several domains including healthcare chatbots, interactive-agents, etc. However, most VQA challenges or datasets deal with English-only captions and questions.

In addition, even recent approaches that have been proposed for VQA generally are obscure due to the fact that CNN-based object detectors are relatively difficult to use and more complex for feature extraction. Click on the expandable region below to see steps for FasterRCNN-based approach.