wesleyacheng
commited on
Commit
•
5305067
1
Parent(s):
1d250fb
Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ One thing about **Vision Transformers** are it has weaker inductive biases compa
|
|
30 |
Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
|
31 |
|
32 |
# Model Description
|
33 |
-
This model is finetuned using the [Google Vision Transformer (vit-base-patch16-224-in21k)](https://huggingface.co/google/vit-base-patch16-224-in21k) on the [Stanford Dogs dataset in Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into
|
34 |
|
35 |
# Intended Uses & Limitations
|
36 |
You can use this finetuned model to classify dog images to 150 dog breeds limited to those that are in the dataset.
|
@@ -52,7 +52,7 @@ inputs = image_processor(images=image, return_tensors="pt")
|
|
52 |
outputs = model(**inputs)
|
53 |
logits = outputs.logits
|
54 |
|
55 |
-
# model predicts one of the
|
56 |
predicted_class_idx = logits.argmax(-1).item()
|
57 |
print("Predicted class:", model.config.id2label[predicted_class_idx])
|
58 |
```
|
|
|
30 |
Luckily, in this model, we will used a **Vision Transformer** from [Google hosted at HuggingFace](https://huggingface.co/google/vit-base-patch16-224-in21k) pre-trained on the [ImageNet-21k dataset](https://paperswithcode.com/paper/imagenet-21k-pretraining-for-the-masses) (14 million images, 21k classes) with 16x16 patches, 224x224 resolution to bypass that data limitation. We will be fine-tuning this model to our "small" dog breeds dataset of around 20 thousand images from the [Stanford Dogs dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) imported by Jessica Li into [Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 types of dog breeds!
|
31 |
|
32 |
# Model Description
|
33 |
+
This model is finetuned using the [Google Vision Transformer (vit-base-patch16-224-in21k)](https://huggingface.co/google/vit-base-patch16-224-in21k) on the [Stanford Dogs dataset in Kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset) to classify dog images into 120 dog breeds.
|
34 |
|
35 |
# Intended Uses & Limitations
|
36 |
You can use this finetuned model to classify dog images to 150 dog breeds limited to those that are in the dataset.
|
|
|
52 |
outputs = model(**inputs)
|
53 |
logits = outputs.logits
|
54 |
|
55 |
+
# model predicts one of the 120 Stanford dog breeds classes
|
56 |
predicted_class_idx = logits.argmax(-1).item()
|
57 |
print("Predicted class:", model.config.id2label[predicted_class_idx])
|
58 |
```
|