sections/finetuning.md · flax-community/Multilingual-VQA at 1efe7581a2e07c9c486eb8bb1d23722dd85748dc

Fine-tuning

Dataset

For fine-tuning, we use the VQA 2.0 dataset - particularly, the train and validation sets. We translate all the questions into the four languages specified above using language-specific MarianMT models. This is because MarianMT models return better labels and are faster, hence, are better for fine-tuning. We get 4x the number of examples in each subset.

Model

We use the SequenceClassification model as reference to create our own sequence classification model. In this, a classification layer is attached on top of the pre-trained BERT model in order to performance multi-class classification. 3129 answer labels are chosen, as is the convention for the English VQA task, which can be found here. These are the same labels used in fine-tuning of the VisualBERT models. The outputs shown here have been translated using the mtranslate Google Translate API library. Then we use various pre-trained checkpoints and train the sequence classification model for various steps.