This is the official checkpoint (adaptive to the official code instead of Huggingface Transformers) of OFA-Base finetuned on the MSCOCO Caption dataset for image captioning. Specifically, the model was first trained with cross-entropy loss and then with CIDEr optimization.

For more information, please refer to the official github (https://github.com/OFA-Sys/OFA)

Temporarily, we only provide the finetuned checkpoints based on the official code.

