merve HF staff commited on
Commit
97b8531
1 Parent(s): 69b6e00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
2
  license: cc-by-nc-4.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ pipeline_tag: image-classification
4
  ---
5
+ # Hiera (Tiny)
6
+
7
+ Hiera is a hierarchical transformer that is a much more efficient alternative to previous series of hierarchical transformers (ConvNeXT and Swin).
8
+ Vanilla transformer architectures (Dosovitskiy et al. 2020) are very popular yet simple and scalable architectures that enable pretraining strategies such as MAE (He et al., 2022).
9
+ However, they use the same spatial resolution and number of channels throughout the network, ViTs make inefficient use of their parameters. This
10
+ is in contrast to prior “hierarchical” or “multi-scale” models (e.g., Krizhevsky et al. (2012); He et al. (2016)), which use fewer channels but higher spatial resolution in early stages
11
+ with simpler features, and more channels but lower spatial resolution later in the model with more complex features.
12
+ These models are way too complex though which add overhead operations to achieve state-of-the-art accuracy in ImageNet-1k, making the model slower.
13
+ Hiera attempts to address this issue by teaching the model spatial biases by training MAE.
14
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6141a88b3a0ec78603c9e784/ogkud4qc564bPX3f0bGXO.png)
15
+
16
+ ## How to Use
17
+
18
+ Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
19
+ Clone the repository.
20
+ ```bash
21
+ git lfs install
22
+ git clone https://huggingface.co/merve/hiera-tiny-ft-224-in1k
23
+ pip install timm
24
+ cd hiera-tiny-ft-224-in1k
25
+ ```
26
+
27
+ ```
28
+ from torchvision import transforms
29
+ from torchvision.transforms.functional import InterpolationMode
30
+ from PIL import Image
31
+ import hiera
32
+ from timm.data.constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
33
+ import requests
34
+ import sys
35
+ sys.path.append("..")
36
+
37
+ model = hiera.hiera_small_224(pretrained=True, checkpoint="mae_in1k_ft_in1k")
38
+ input_size = 224
39
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
40
+ image = Image.open(requests.get(url, stream=True).raw)
41
+
42
+ # preprocess the image
43
+ transform_list = [
44
+ transforms.Resize(int((256 / 224) * input_size), interpolation=InterpolationMode.BICUBIC),
45
+ transforms.CenterCrop(input_size)
46
+ ]
47
+ transform_vis = transforms.Compose(transform_list)
48
+ transform_norm = transforms.Compose(transform_list + [
49
+ transforms.ToTensor(),
50
+ transforms.Normalize(IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD),
51
+ ])
52
+ img_vis = transform_vis(image)
53
+ img_norm = transform_norm(image)
54
+
55
+ # Get imagenet class as output
56
+ out = model(img_norm[None, ...])
57
+ # tabby cat
58
+ out.argmax(dim=-1).item()
59
+ ```
60
+
61
+ You can try the fine-tuned model [here](https://colab.research.google.com/drive/1WIYWaCWiv5QK-MpNr-bEvqgTS1DIW19Z?usp=sharing).