- This demo loads the `FlaxCLIPVisionMBartforConditionlGeneration` present in [official model repo] (https://huggingface.co/flax-community/clip-vit-base-patch32_mbart-large-50). 100 random validation set examples are present in the `references.tsv` with respective images in the `images` directory. - We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with the other languages. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon. - The sidebar contains generation parameters such as `Number of Beams`, `Top-P`, `Temperature` which will be used when generating the caption. - One can choose the `Language` of the caption in the dropdown below to generate a caption in that particular language. - Lastly, keeping in mind its intended future scope for visually challenged people, we also provide audio clip for the generated sequence.