timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
16a083c
1 Parent(s): c6c9e29

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +21 -17
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,7 +2,7 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
6
  license: apache-2.0
7
  datasets:
8
  - imagenet-12k
@@ -12,7 +12,7 @@ datasets:
12
  A timm specific MaxViT (w/ a MLP Log-CPB (continuous log-coordinate relative position bias motivated by Swin-V2) image classification model. Trained in `timm` on ImageNet-12k (a 11821 class subset of full ImageNet-22k) by Ross Wightman.
13
 
14
 
15
- ### Model Variants in [maxxvit.py](https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/maxxvit.py)
16
 
17
  MaxxViT covers a number of related model architectures that share a common structure including:
18
  - CoAtNet - Combining MBConv (depthwise-separable) convolutional blocks in early stages with self-attention transformer blocks in later stages.
@@ -43,8 +43,9 @@ from urllib.request import urlopen
43
  from PIL import Image
44
  import timm
45
 
46
- img = Image.open(
47
- urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
 
48
 
49
  model = timm.create_model('maxvit_rmlp_base_rw_224.sw_in12k', pretrained=True)
50
  model = model.eval()
@@ -64,8 +65,9 @@ from urllib.request import urlopen
64
  from PIL import Image
65
  import timm
66
 
67
- img = Image.open(
68
- urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
 
69
 
70
  model = timm.create_model(
71
  'maxvit_rmlp_base_rw_224.sw_in12k',
@@ -82,12 +84,13 @@ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batc
82
 
83
  for o in output:
84
  # print shape of each feature map in output
85
- # e.g.:
86
- # torch.Size([1, 128, 192, 192])
87
- # torch.Size([1, 128, 96, 96])
88
- # torch.Size([1, 256, 48, 48])
89
- # torch.Size([1, 512, 24, 24])
90
- # torch.Size([1, 1024, 12, 12])
 
91
  print(o.shape)
92
  ```
93
 
@@ -97,8 +100,9 @@ from urllib.request import urlopen
97
  from PIL import Image
98
  import timm
99
 
100
- img = Image.open(
101
- urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
 
102
 
103
  model = timm.create_model(
104
  'maxvit_rmlp_base_rw_224.sw_in12k',
@@ -116,10 +120,10 @@ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_featu
116
  # or equivalently (without needing to set num_classes=0)
117
 
118
  output = model.forward_features(transforms(img).unsqueeze(0))
119
- # output is unpooled (ie.e a (batch_size, num_features, H, W) tensor
120
 
121
  output = model.forward_head(output, pre_logits=True)
122
- # output is (batch_size, num_features) tensor
123
  ```
124
 
125
  ## Model Comparison
@@ -227,7 +231,7 @@ output = model.forward_head(output, pre_logits=True)
227
  publisher = {GitHub},
228
  journal = {GitHub repository},
229
  doi = {10.5281/zenodo.4414861},
230
- howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
231
  }
232
  ```
233
  ```bibtex
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
  license: apache-2.0
7
  datasets:
8
  - imagenet-12k
 
12
  A timm specific MaxViT (w/ a MLP Log-CPB (continuous log-coordinate relative position bias motivated by Swin-V2) image classification model. Trained in `timm` on ImageNet-12k (a 11821 class subset of full ImageNet-22k) by Ross Wightman.
13
 
14
 
15
+ ### Model Variants in [maxxvit.py](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/maxxvit.py)
16
 
17
  MaxxViT covers a number of related model architectures that share a common structure including:
18
  - CoAtNet - Combining MBConv (depthwise-separable) convolutional blocks in early stages with self-attention transformer blocks in later stages.
 
43
  from PIL import Image
44
  import timm
45
 
46
+ img = Image.open(urlopen(
47
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
48
+ ))
49
 
50
  model = timm.create_model('maxvit_rmlp_base_rw_224.sw_in12k', pretrained=True)
51
  model = model.eval()
 
65
  from PIL import Image
66
  import timm
67
 
68
+ img = Image.open(urlopen(
69
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
70
+ ))
71
 
72
  model = timm.create_model(
73
  'maxvit_rmlp_base_rw_224.sw_in12k',
 
84
 
85
  for o in output:
86
  # print shape of each feature map in output
87
+ # e.g.:
88
+ # torch.Size([1, 64, 112, 112])
89
+ # torch.Size([1, 96, 56, 56])
90
+ # torch.Size([1, 192, 28, 28])
91
+ # torch.Size([1, 384, 14, 14])
92
+ # torch.Size([1, 768, 7, 7])
93
+
94
  print(o.shape)
95
  ```
96
 
 
100
  from PIL import Image
101
  import timm
102
 
103
+ img = Image.open(urlopen(
104
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
105
+ ))
106
 
107
  model = timm.create_model(
108
  'maxvit_rmlp_base_rw_224.sw_in12k',
 
120
  # or equivalently (without needing to set num_classes=0)
121
 
122
  output = model.forward_features(transforms(img).unsqueeze(0))
123
+ # output is unpooled, a (1, 768, 7, 7) shaped tensor
124
 
125
  output = model.forward_head(output, pre_logits=True)
126
+ # output is a (1, num_features) shaped tensor
127
  ```
128
 
129
  ## Model Comparison
 
231
  publisher = {GitHub},
232
  journal = {GitHub repository},
233
  doi = {10.5281/zenodo.4414861},
234
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
235
  }
236
  ```
237
  ```bibtex
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9459e17880c626fe77caad972b7d6cf190d89f89ab36e23ba89a3b189c6e637
3
+ size 498520682