Checkpoints just with ViT-g Dimension (1408) for the Q-former (cross-att)?

#3
by Daromog - opened

All this Models use the dimension of ViT-g in the Q-Former(cross-att). Is there some place to get the checkpoints with the dimension of ViT-L

Hi,

Have the authors released checkpoints with a ViT-L vision backbone?

Hi,

Have the authors released checkpoints with a ViT-L vision backbone?

I found it here:
https://github.com/salesforce/LAVIS/pull/169/commits/25f86f65895f4142c18970ae11a57ae4dda2c7e2

Ok, so probably you can leverage this conversion script to convert them to the HF format: https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py

Sign up or log in to comment