Multilingual-VQA / sections /mlm_usage.md
gchhablani's picture
Remove conclusion temporarily
ffe19d9
|
raw
history blame
1.07 kB
  • This demo loads the FlaxCLIPVisionBertForMaskedLM present in the model directory of this repository. The checkpoint is loaded from flax-community/clip-vision-bert-cc12m-70k which is pre-trained checkpoint with 70k steps.

  • 100 random validation set examples are present in the cc12m_data/vqa_val.tsv with respective images in the cc12m_data/images_data directory.

  • You can get a random example by clicking on Get a random example button. The caption is tokenized and a random token is masked by replacing it with [MASK].

  • We provide English Translation of the caption for users who are not well-acquainted with the other languages. This is done using mtranslate to keep things flexible enough and needs internet connection as it uses the Google Translate API.

  • The model predicts the scores for tokens from the bert-base-multilingual-uncased checkpoint.

  • The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.