Is there any finetuning code available for sharing with the open source community? Can you also share the details of the model like what it's based on and resolution so on?
Thank you cool work!

Hi, thanks!
We are currently building a vision language foundation model.
We fine-tuned an early checkpoint on the new dataset WebSight to be able to have a demo for the release.
Apart from that, the current model VLM_WebSight_finetuned is not really meant to be used.
We will release our final model in maybe two months, and provide a lot of details about it and finetuning scripts, like we did for Idefics with a [colab].(

