Spaces:
Runtime error
Runtime error
File size: 1,114 Bytes
fb3c77c 69e32d1 1e0d575 69e32d1 2450812 69e32d1 0808df5 |
1 2 3 4 5 6 7 8 9 |
- This demo loads the `FlaxCLIPVisionBertForSequenceClassification` present in the `model` directory of this repository. The checkpoint is loaded from [`flax-community/clip-vision-bert-vqa-ft-6k`](https://huggingface.co/flax-community/clip-vision-bert-vqa-ft-6k) which is pre-trained checkpoint with 60k steps and 6k fine-tuning steps. 100 random validation set examples are present in the `dummy_vqa_multilingual.tsv` with respective images in the `images/val2014` directory.
- We provide `English Translation` of the question for users who are not well-acquainted with the other languages. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API.
- The model predicts the answers from a list of 3129 answers which have their labels present in `answer_reverse_mapping.json`.
- Lastly, one can choose the `Answer Language` which also uses a saved dictionary created using `mtranslate` library for the 3129 answer options.
- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot. |