yourusername commited on
Commit
9010034
1 Parent(s): c7ce0d9

:pencil: edit README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -16,6 +16,27 @@ Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 2
16
 
17
  Check out the code at my [my Github repo](https://github.com/nateraw/huggingface-vit-finetune).
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Model description
20
 
21
  The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.
 
16
 
17
  Check out the code at my [my Github repo](https://github.com/nateraw/huggingface-vit-finetune).
18
 
19
+ ## Usage
20
+
21
+ ```python
22
+ from transformers import ViTFeatureExtractor, ViTForImageClassification
23
+ from PIL import Image
24
+ import requests
25
+
26
+ url = 'https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog10.png'
27
+ image = Image.open(requests.get(url, stream=True).raw)
28
+ feature_extractor = ViTFeatureExtractor.from_pretrained('nateraw/vit-base-patch16-224-cifar10')
29
+ model = ViTForImageClassification.from_pretrained('nateraw/vit-base-patch16-224-cifar10')
30
+ inputs = feature_extractor(images=image, return_tensors="pt")
31
+ outputs = model(**inputs)
32
+ preds = outputs.logits.argmax(dim=1)
33
+
34
+ classes = [
35
+ 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'
36
+ ]
37
+ classes[preds[0]]
38
+ ```
39
+
40
  ## Model description
41
 
42
  The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.