wesleyacheng
commited on
Commit
•
01a9ac1
1
Parent(s):
78b9803
Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,17 @@ metrics:
|
|
4 |
- accuracy
|
5 |
- f1
|
6 |
pipeline_tag: image-classification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
|
|
|
|
9 |
Recently, someone asked me if you can classify dog images into their respective dog breeds instead just differentiating from cats vs dogs like my last [notebook](https://www.kaggle.com/code/wesleyacheng/cat-vs-dog-image-classification-with-cnns). I say **YES**!
|
10 |
|
11 |
Due to the complexity of the problem, we will be using the most advanced computer vision architecture released in the [2020 Google paper](https://arxiv.org/pdf/2010.11929v2.pdf), the [**Vision Transformer**](https://paperswithcode.com/methods/category/vision-transformer).
|
@@ -18,3 +27,42 @@ One thing about **Vision Transformers** are it has weaker inductive biases compa
|
|
18 |
|
19 |
Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- accuracy
|
5 |
- f1
|
6 |
pipeline_tag: image-classification
|
7 |
+
widget:
|
8 |
+
- src: https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Welchcorgipembroke.JPG/1200px-Welchcorgipembroke.JPG
|
9 |
+
example_title: Pembroke Corgi
|
10 |
+
- src: https://upload.wikimedia.org/wikipedia/commons/d/df/Shihtzu_%28cropped%29.jpg
|
11 |
+
example_title: Shih Tzu
|
12 |
+
- src: https://upload.wikimedia.org/wikipedia/commons/5/55/Beagle_600.jpg
|
13 |
+
example_title: Beagle
|
14 |
---
|
15 |
|
16 |
+
# Model Motivation
|
17 |
+
|
18 |
Recently, someone asked me if you can classify dog images into their respective dog breeds instead just differentiating from cats vs dogs like my last [notebook](https://www.kaggle.com/code/wesleyacheng/cat-vs-dog-image-classification-with-cnns). I say **YES**!
|
19 |
|
20 |
Due to the complexity of the problem, we will be using the most advanced computer vision architecture released in the [2020 Google paper](https://arxiv.org/pdf/2010.11929v2.pdf), the [**Vision Transformer**](https://paperswithcode.com/methods/category/vision-transformer).
|
|
|
27 |
|
28 |
Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
|
29 |
|
30 |
+
# Model Description
|
31 |
+
This model is finetuned using the [Google Vision Transformer (vit-base-patch16-224-in21k)](https://huggingface.co/google/vit-base-patch16-224-in21k) on the [Stanford Dogs dataset in Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 150 dog breeds.
|
32 |
+
|
33 |
+
# Intended Uses & Limitations
|
34 |
+
You can use this finetuned model to classify dog images to 150 dog breeds limited to those that are in the dataset.
|
35 |
+
|
36 |
+
# How to Use
|
37 |
+
```python
|
38 |
+
from transformers import AutoImageProcesssor, AutoModelForImageClassification
|
39 |
+
import Image
|
40 |
+
import requests
|
41 |
+
|
42 |
+
url = "https://upload.wikimedia.org/wikipedia/commons/8/8b/Husky_L.jpg"
|
43 |
+
image = PIL.Image.open(requests.get(url, stream=True).raw)
|
44 |
+
|
45 |
+
image_processor = AutoImageProcesssor.from_pretrained("wesleyacheng/dog-breeds-multiclass-image-classification-with-vit")
|
46 |
+
model = AutoModelForImageClassification.from_pretrained("wesleyacheng/dog-breeds-multiclass-image-classification-with-vit")
|
47 |
+
|
48 |
+
inputs = image_processor(images=image, return_tensors="pt")
|
49 |
+
|
50 |
+
outputs = model(**inputs)
|
51 |
+
logits = outputs.logits
|
52 |
+
|
53 |
+
# model predicts one of the 150 Stanford dog breeds classes
|
54 |
+
predicted_class_idx = logits.argmax(-1).item()
|
55 |
+
print("Predicted class:", model.config.id2label[predicted_class_idx])
|
56 |
+
```
|
57 |
+
|
58 |
+
# Model Training Metrics
|
59 |
+
| Epoch | Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | Macro F1 |
|
60 |
+
|-------|----------------|-----------------|----------------|----------|
|
61 |
+
| 1 | 79.8% | 95.1% | 97.5% | 77.2% |
|
62 |
+
| 2 | 83.8% | 96.7% | 98.2% | 81.9% |
|
63 |
+
| 3 | 84.8% | 96.7% | 98.3% | 83.4% |
|
64 |
+
|
65 |
+
# Model Evaluation Metrics
|
66 |
+
| Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | Macro F1 |
|
67 |
+
|----------------|-----------------|----------------|----------|
|
68 |
+
| 84.0% | 97.1% | 98.7% | 83.0% |
|