Why the sequence length is not the equal to the number of patches?

#1
by sugarExcess - opened

The last_hidden_state of the model outputs arrays of shape (batch_size, sequence_length, hidden_dim).
The sequence_length is 50 but from my understanding of the ViT paper it should be equal to the number of patches 224*224 / 16*16 = 196.
Screenshot_20231205_121919.png
I checked the original model and there the sequence_length is indeed 196.
What am I missing?

Hello, using the example furnished in the model card, I get :
print(outputs['logits'].shape)

torch.Size([1, 196, 768])
You may have been modifying the configuration or the input shape of your image is lower than the one expected, giving less patches than with 224*224

Thank you for the response!

sugarExcess changed discussion status to closed

Sign up or log in to comment