[Error] Error when executing the example code

#3
by StarCycle - opened

Hi,

If I run the model with example code within the folder, I get this error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-fd9293295145> in <cell line: 2>()
      1 import torch
----> 2 from modeling_siglip import SiglipVisionModel
      3 
      4 DEVICE = torch.device("cuda:0")
      5 PATCH_SIZE = 14

/content/siglip-so400m-14-980-flash-attn2-navit/modeling_siglip.py in <module>
     40     replace_return_docstrings,
     41 )
---> 42 from .configuration_siglip import SiglipConfig, SiglipTextConfig, SiglipVisionConfig
     43 
     44 

ImportError: attempted relative import with no known parent package

If I replace the source code files in transformers (e.g., modeling_siglip.py) with the source code files in this repo, I get this error:

4c7619335ec2d1e093954af4a2d4778.png

Actually there is the argument:

bf8fd05b0d7883b1c2047e81bc90e77.png

If I run the code with:

import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-384-flash-attn2", trust_remote_code=True)
model.eval().cuda().half()

pixel_values = torch.randn(1, 3, 384, 384).cuda().half()
output= model.vision_model(pixel_values)

It does work. But the model only accepts images with 384*384 resolution. If I send an image with 512*512 resolution, I will get a dimension mismatch error from the position embedding.

Could you please modify the example code so it can be executed? How to run the model successfully with Google Colab?

HuggingFaceM4 org

Hi @StarCycle
Have you tried a model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit"); model.vision_model?

Thanks @VictorSanh !

It should be OK with

import torch
from transformers import AutoModel

pixel_values = torch.randn(1, 3, 224, 384).cuda().half() # any resolution here
model = AutoModel.from_pretrained("HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit", trust_remote_code=True)
model.eval().cuda().half()
output= model.vision_model(pixel_values)

Is it necessary to specify the patch_attention_mask in the example of README?

Sign up or log in to comment