Finetuning owlv2
#6
by
kevinjeswani
- opened
Hi
I've tried having a go at finetuning this with whatever available docs there for finetuning vision transformers (https://huggingface.co/learn/computer-vision-course/en/unit3/vision-transformers/vision-transformer-for-objection-detection), but I'm completely lost.
Any suggestions on how to go about this or where I can find this information? How should the input dataset be structured?
Thanks!