This model is similar to https://huggingface.co/nlpconnect/vit-gpt2-image-captioning but uses Distil-GPT2 instead of GPT2 for the text encoder
This model is similar to https://huggingface.co/nlpconnect/vit-gpt2-image-captioning but uses Distil-GPT2 instead of GPT2 for the text encoder