Model Overview
Model Summary
Vision Transformer (ViT) adapts the Transformer architecture, originally designed for natural language processing, to the domain of computer vision. It treats images as sequences of patches, similar to how Transformers treat sentences as sequences of words.. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
Links:
- Vit Quickstart Notebook
- [Vit API Documentation](coming soon)
- Vit Model Card
- KerasHub Beginner Guide
- KerasHub Model Publishing Guide
Installation
Keras and KerasHub can be installed with:
pip install -U -q keras-hub
pip install -U -q keras
Presets
Model ID | img_size | Acc | Top-5 | Parameters |
---|---|---|---|---|
Base | ||||
vit_base_patch16_224_imagenet | 224 | - | - | 85798656 |
vit_base_patch_16_224_imagenet21k | 224 | - | - | 85798656 |
vit_base_patch_16_384_imagenet | 384 | - | - | 86090496 |
vit_base_patch32_224_imagenet21k | 224 | - | - | 87455232 |
vit_base_patch32_384_imagenet | 384 | - | - | 87528192 |
Large | ||||
vit_large_patch16_224_imagenet | 224 | - | - | 303301632 |
vit_large_patch16_224_imagenet21k | 224 | - | - | 303301632 |
vit_large_patch16_384_imagenet | 224 | - | - | 303690752 |
vit_large_patch32_224_imagenet21k | 224 | - | - | 305510400 |
vit_large_patch32_384_imagenet | 224 | - | - | 305607680 |
Huge | ||||
vit_huge_patch14_224_imagenet21k | 224 | - | - | 630764800 |
Example Usage
Pretrained ViT model
image_classifier = keras_hub.models.ImageClassification.from_preset(
"vit_large_patch16_224_imagenet21k"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)
Load the backbone weights and fine-tune model for custom dataset.
backbone = keras_hub.models.Backbone.from_preset(
"vit_large_patch16_224_imagenet21k"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
"vit_large_patch16_224_imagenet21k"
)
model = keras_hub.models.ViTImageClassifier(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor,
)
Example Usage with Hugging Face URI
Pretrained ViT model
image_classifier = keras_hub.models.ImageClassification.from_preset(
"hf://keras/vit_large_patch16_224_imagenet21k"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)
Load the backbone weights and fine-tune model for custom dataset.
backbone = keras_hub.models.Backbone.from_preset(
"hf://keras/vit_large_patch16_224_imagenet21k"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
"hf://keras/vit_large_patch16_224_imagenet21k"
)
model = keras_hub.models.ViTImageClassifier(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor,
)
- Downloads last month
- 7
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.