sections/usage.md · flax-community/Multilingual-VQA at 0cb857646762f1bfc309c84a2a6870951db7cdef

This demo loads the FlaxCLIPVisionBertForSequenceClassificationModel present in the model directory of this repository. The checkpoint is loaded from ckpt/ckpt-60k-5999 which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random validation set examples are present in the dummy_vqa_multilingual.tsv with respective images in the images/val2014 directory.
We provide English Translation of the question for users who are not well-acquainted with the other languages. This is done using mtranslate to keep things flexible enough and needs internet connection as it uses the Google Translate API.
The model predicts the answers from a list of 3129 answers which have their labels present in answer_reverse_mapping.json.
Lastly, one can choose the Answer Language which also uses a saved dictionary created using mtranslate library for the 3129 answer options.
The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.