## Social Impact Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, not to mention, with a longer training time. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.