Fine-tune
Please, How can I make fine-tune on custom dataset?
Yes it's definitely possible to fine-tune on (image, text) pairs.
Basically, each item of the dataset should be a pair of (pixel_values, labels), where the labels are the input_ids of the target sequence.
@GehadAbokamar
You can refer to following links:
https://sachinruk.github.io/blog/pytorch/huggingface/2021/12/28/vit-to-gpt2-encoder-decoder-model.html
Thank you for helping^^
I tried to finetune but faced several problems. I believe I need to specify for dataset proper naming and preprocessing, but dont know how:
Please try from this: https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/
Hi @ankur310794 I'm looking to finetune this model a custom dataset however these two links you provided are no longer valid. Are there any other resources to assist with fine-tuning this model in PyTorch?