glasses
/

vit_base_patch16_384

Inference Endpoints

Model card Files Files and versions Community

Francesco commited on Apr 22, 2021

Commit

5783307

•

1 Parent(s): 5847f6f

commit files to HF hub

Files changed (1) hide show

README.md +48 -0

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# vit_base_patch16_384
+ Implementation of Vision Transformer (ViT) proposed in [An Image Is
+ Worth 16x16 Words: Transformers For Image Recognition At
+ Scale](https://arxiv.org/pdf/2010.11929.pdf)
+ The following image from the authors shows the architecture.
+ ![image](https://github.com/FrancescoSaverioZuppichini/glasses/blob/develop/docs/_static/images/ViT.png?raw=true)
+ ``` python
+ ViT.vit_small_patch16_224()
+ ViT.vit_base_patch16_224()
+ ViT.vit_base_patch16_384()
+ ViT.vit_base_patch32_384()
+ ViT.vit_huge_patch16_224()
+ ViT.vit_huge_patch32_384()
+ ViT.vit_large_patch16_224()
+ ViT.vit_large_patch16_384()
+ ViT.vit_large_patch32_384()
+ ```
+ Examples:
+  ``` python
+  # change activation
+  ViT.vit_base_patch16_224(activation = nn.SELU)
+  # change number of classes (default is 1000 )
+  ViT.vit_base_patch16_224(n_classes=100)
+  # pass a different block, default is TransformerEncoderBlock
+  ViT.vit_base_patch16_224(block=MyCoolTransformerBlock)
+  # get features
+  model = ViT.vit_base_patch16_224
+  # first call .features, this will activate the forward hooks and tells the model you'll like to get the features
+  model.encoder.features
+  model(torch.randn((1,3,224,224)))
+  # get the features from the encoder
+  features = model.encoder.features
+  print([x.shape for x in features])
+  #[[torch.Size([1, 197, 768]),  torch.Size([1, 197, 768]), ...]
+  # change the tokens, you have to subclass ViTTokens
+  class MyTokens(ViTTokens):
+      def __init__(self, emb_size: int):
+          super().__init__(emb_size)
+          self.my_new_token = nn.Parameter(torch.randn(1, 1, emb_size))
+  ViT(tokens=MyTokens)
+  ```