gchhablani's picture
Fix typo
2450812
|
raw
history blame
1.03 kB
  • This demo loads the FlaxCLIPVisionBertForSequenceClassificationModel present in the model directory of this repository. The checkpoint is loaded from ckpt/ckpt-60k-5999 which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random validation set examples are present in the dummy_vqa_multilingual.tsv with respective images in the images/val2014 directory.

  • We provide English Translation of the question for users who are not well-acquainted with the other languages. This is done using mtranslate to keep things flexible enough and needs internet connection as it uses the Google Translate API.

  • The model predicts the answers from a list of 3129 answers which have their labels present in answer_reverse_mapping.json.

  • Lastly, one can choose the Answer Language which also uses a saved dictionary created using mtranslate library for the 3129 answer options.

  • The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.