This model is a fine-tuned version of Vision Encoder Decoder on coco-flickr-farsi.

Framework versions

  • Transformers 4.12.5
  • Pytorch 1.9.1
  • Datasets 1.16.1
  • Tokenizers 0.10.3
Space using MahsaShahidi/Persian-Image-Captioning 1

