Checkpoints just with ViT-g Dimension (1408) for the Q-former (cross-att)?

by Daromog - opened

All this Models use the dimension of ViT-g in the Q-Former(cross-att). Is there some place to get the checkpoints with the dimension of ViT-L


Have the authors released checkpoints with a ViT-L vision backbone?


Have the authors released checkpoints with a ViT-L vision backbone?

I found it here:

Ok, so probably you can leverage this conversion script to convert them to the HF format:

Sign up or log in to comment