sections/mlm_usage.md · flax-community/Multilingual-VQA at ffe19d9462aa1f7954056139aad3850b53e8e2a8

This demo loads the FlaxCLIPVisionBertForMaskedLM present in the model directory of this repository. The checkpoint is loaded from flax-community/clip-vision-bert-cc12m-70k which is pre-trained checkpoint with 70k steps.
100 random validation set examples are present in the cc12m_data/vqa_val.tsv with respective images in the cc12m_data/images_data directory.
You can get a random example by clicking on Get a random example button. The caption is tokenized and a random token is masked by replacing it with [MASK].
We provide English Translation of the caption for users who are not well-acquainted with the other languages. This is done using mtranslate to keep things flexible enough and needs internet connection as it uses the Google Translate API.
The model predicts the scores for tokens from the bert-base-multilingual-uncased checkpoint.
The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.