- This demo loads the `FlaxCLIPVisionMBartforConditionlGeneration` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-49499` which is pre-trained checkpoint with 70k steps. 100 random validation set examples are present in the `references.tsv` with respective images in the `images` directory. - We provide `English Translation` of the generated caption and reference captions for users who are not well-acquainted with the other languages. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API. We will also add the original captions soon. - The sidebar contains generation parameters such as `Number of Beams`, `Top-P`, `Temperature` which will be used when generating the caption. - Lastly, one can choose the `Language` of the caption in the dropdown below to generate a caption in that particular language.