wesleyacheng commited on
Commit
01a9ac1
1 Parent(s): 78b9803

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -4,8 +4,17 @@ metrics:
4
  - accuracy
5
  - f1
6
  pipeline_tag: image-classification
 
 
 
 
 
 
 
7
  ---
8
 
 
 
9
  Recently, someone asked me if you can classify dog images into their respective dog breeds instead just differentiating from cats vs dogs like my last [notebook](https://www.kaggle.com/code/wesleyacheng/cat-vs-dog-image-classification-with-cnns). I say **YES**!
10
 
11
  Due to the complexity of the problem, we will be using the most advanced computer vision architecture released in the [2020 Google paper](https://arxiv.org/pdf/2010.11929v2.pdf), the [**Vision Transformer**](https://paperswithcode.com/methods/category/vision-transformer).
@@ -18,3 +27,42 @@ One thing about **Vision Transformers** are it has weaker inductive biases compa
18
 
19
  Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - accuracy
5
  - f1
6
  pipeline_tag: image-classification
7
+ widget:
8
+ - src: https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Welchcorgipembroke.JPG/1200px-Welchcorgipembroke.JPG
9
+ example_title: Pembroke Corgi
10
+ - src: https://upload.wikimedia.org/wikipedia/commons/d/df/Shihtzu_%28cropped%29.jpg
11
+ example_title: Shih Tzu
12
+ - src: https://upload.wikimedia.org/wikipedia/commons/5/55/Beagle_600.jpg
13
+ example_title: Beagle
14
  ---
15
 
16
+ # Model Motivation
17
+
18
  Recently, someone asked me if you can classify dog images into their respective dog breeds instead just differentiating from cats vs dogs like my last [notebook](https://www.kaggle.com/code/wesleyacheng/cat-vs-dog-image-classification-with-cnns). I say **YES**!
19
 
20
  Due to the complexity of the problem, we will be using the most advanced computer vision architecture released in the [2020 Google paper](https://arxiv.org/pdf/2010.11929v2.pdf), the [**Vision Transformer**](https://paperswithcode.com/methods/category/vision-transformer).
 
27
 
28
  Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
29
 
30
+ # Model Description
31
+ This model is finetuned using the [Google Vision Transformer (vit-base-patch16-224-in21k)](https://huggingface.co/google/vit-base-patch16-224-in21k) on the [Stanford Dogs dataset in Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 150 dog breeds.
32
+
33
+ # Intended Uses & Limitations
34
+ You can use this finetuned model to classify dog images to 150 dog breeds limited to those that are in the dataset.
35
+
36
+ # How to Use
37
+ ```python
38
+ from transformers import AutoImageProcesssor, AutoModelForImageClassification
39
+ import Image
40
+ import requests
41
+
42
+ url = "https://upload.wikimedia.org/wikipedia/commons/8/8b/Husky_L.jpg"
43
+ image = PIL.Image.open(requests.get(url, stream=True).raw)
44
+
45
+ image_processor = AutoImageProcesssor.from_pretrained("wesleyacheng/dog-breeds-multiclass-image-classification-with-vit")
46
+ model = AutoModelForImageClassification.from_pretrained("wesleyacheng/dog-breeds-multiclass-image-classification-with-vit")
47
+
48
+ inputs = image_processor(images=image, return_tensors="pt")
49
+
50
+ outputs = model(**inputs)
51
+ logits = outputs.logits
52
+
53
+ # model predicts one of the 150 Stanford dog breeds classes
54
+ predicted_class_idx = logits.argmax(-1).item()
55
+ print("Predicted class:", model.config.id2label[predicted_class_idx])
56
+ ```
57
+
58
+ # Model Training Metrics
59
+ | Epoch | Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | Macro F1 |
60
+ |-------|----------------|-----------------|----------------|----------|
61
+ | 1 | 79.8% | 95.1% | 97.5% | 77.2% |
62
+ | 2 | 83.8% | 96.7% | 98.2% | 81.9% |
63
+ | 3 | 84.8% | 96.7% | 98.3% | 83.4% |
64
+
65
+ # Model Evaluation Metrics
66
+ | Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | Macro F1 |
67
+ |----------------|-----------------|----------------|----------|
68
+ | 84.0% | 97.1% | 98.7% | 83.0% |