Script for converting the checkpoint

#1
by HugoLaurencon HF staff - opened

Hi @gollark , could you provide the script you used to make the conversion to transformers? Thanks!

It's an ugly hack in here (https://github.com/osmarks/transformers-patch-siglip) somewhere.

I forgot how exactly I did it but you need to copy some hardcoded dimensions out of BigVision (the Google repo for training). In principle they could probably be read out of the checkpoints but I didn't bother with this.

Yes exactly, essentially download with
gsutil cp gs://big_vision/siglip/webli_en_so400m_384_58765454.npz ./

I checked on the original weights that the dimensions are correct with

config.text_config.vocab_size = 32000
config.text_config.hidden_size = 1152
config.text_config.intermediate_size = 4304
config.text_config.num_hidden_layers = 27
config.text_config.num_attention_heads = 16
config.text_config.max_position_embeddings = 64

config.vision_config.hidden_size = 1152
config.vision_config.intermediate_size = 4304
config.vision_config.num_hidden_layers = 27
config.vision_config.num_attention_heads = 16
config.vision_config.image_size = 384
config.vision_config.patch_size = 14

(same as you did)

HugoLaurencon changed discussion status to closed

Sign up or log in to comment