Problem When loading weights
hi,
I've tried loading the pretrained weights into sam2_hiera_s.yaml, but I get some mismatch between many weights in the image encoder... what am I doing wrong?
that's the full error log:
RuntimeError: Error(s) in loading state_dict for SAM2VideoPredictor:
Missing key(s) in state_dict: "image_encoder.trunk.blocks.12.norm1.weight", "image_encoder.trunk.blocks.12.norm1.bias", "image_encoder.trunk.blocks.12.attn.qkv.
weight", "image_encoder.trunk.blocks.12.attn.qkv.bias", "image_encoder.trunk.blocks.12.attn.proj.weight", "image_encoder.trunk.blocks.12.attn.proj.bias", "image_encoder
.trunk.blocks.12.norm2.weight", "image_encoder.trunk.blocks.12.norm2.bias", "image_encoder.trunk.blocks.12.mlp.layers.0.weight", "image_encoder.trunk.blocks.12.mlp.laye
rs.0.bias", "image_encoder.trunk.blocks.12.mlp.layers.1.weight", "image_encoder.trunk.blocks.12.mlp.layers.1.bias", "image_encoder.trunk.blocks.13.norm1.weight", "image
_encoder.trunk.blocks.13.norm1.bias", "image_encoder.trunk.blocks.13.attn.qkv.weight", "image_encoder.trunk.blocks.13.attn.qkv.bias", "image_encoder.trunk.blocks.13.att
n.proj.weight", "image_encoder.trunk.blocks.13.attn.proj.bias", "image_encoder.trunk.blocks.13.norm2.weight", "image_encoder.trunk.blocks.13.norm2.bias", "image_encoder
.trunk.blocks.13.mlp.layers.0.weight", "image_encoder.trunk.blocks.13.mlp.layers.0.bias", "image_encoder.trunk.blocks.13.mlp.layers.1.weight", "image_encoder.trunk.bloc
ks.13.mlp.layers.1.bias", "image_encoder.trunk.blocks.14.norm1.weight", "image_encoder.trunk.blocks.14.norm1.bias", "image_encoder.trunk.blocks.14.attn.qkv.weight", "im
age_encoder.trunk.blocks.14.attn.qkv.bias", "image_encoder.trunk.blocks.14.attn.proj.weight", "image_encoder.trunk.blocks.14.attn.proj.bias", "image_encoder.trunk.block
s.14.norm2.weight", "image_encoder.trunk.blocks.14.norm2.bias", "image_encoder.trunk.blocks.14.mlp.layers.0.weight", "image_encoder.trunk.blocks.14.mlp.layers.0.bias",
"image_encoder.trunk.blocks.14.mlp.layers.1.weight", "image_encoder.trunk.blocks.14.mlp.layers.1.bias", "image_encoder.trunk.blocks.14.proj.weight", "image_encoder.trun
k.blocks.14.proj.bias", "image_encoder.trunk.blocks.15.norm1.weight", "image_encoder.trunk.blocks.15.norm1.bias", "image_encoder.trunk.blocks.15.attn.qkv.weight", "imag
e_encoder.trunk.blocks.15.attn.qkv.bias", "image_encoder.trunk.blocks.15.attn.proj.weight", "image_encoder.trunk.blocks.15.attn.proj.bias", "image_encoder.trunk.blocks.
15.norm2.weight", "image_encoder.trunk.blocks.15.norm2.bias", "image_encoder.trunk.blocks.15.mlp.layers.0.weight", "image_encoder.trunk.blocks.15.mlp.layers.0.bias", "image_encoder.trunk.blocks.15.mlp.layers.1.weight", "image_encoder.trunk.blocks.15.mlp.layers.1.bias".
Unexpected key(s) in state_dict: "image_encoder.trunk.blocks.10.proj.weight", "image_encoder.trunk.blocks.10.proj.bias".
size mismatch for image_encoder.trunk.blocks.10.attn.qkv.weight: copying a param with shape torch.Size([2304, 384]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for image_encoder.trunk.blocks.10.attn.qkv.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for image_encoder.trunk.blocks.10.attn.proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for image_encoder.trunk.blocks.10.attn.proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.10.norm2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.10.norm2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.10.mlp.layers.0.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for image_encoder.trunk.blocks.10.mlp.layers.0.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for image_encoder.trunk.blocks.10.mlp.layers.1.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for image_encoder.trunk.blocks.10.mlp.layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.norm1.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.norm1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.attn.qkv.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for image_encoder.trunk.blocks.11.attn.qkv.bias: copying a param with shape torch.Size([2304]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for image_encoder.trunk.blocks.11.attn.proj.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for image_encoder.trunk.blocks.11.attn.proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.norm2.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.norm2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for image_encoder.trunk.blocks.11.mlp.layers.0.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for image_encoder.trunk.blocks.11.mlp.layers.0.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for image_encoder.trunk.blocks.11.mlp.layers.1.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for image_encoder.trunk.blocks.11.mlp.layers.1.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).