Model Overview

Model Summary

Vision Transformer (ViT) adapts the Transformer architecture, originally designed for natural language processing, to the domain of computer vision. It treats images as sequences of patches, similar to how Transformers treat sentences as sequences of words.. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Links:

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras

Presets

Model ID img_size Acc Top-5 Parameters
Base  
vit_base_patch16_224_imagenet 224 - - 85798656
vit_base_patch_16_224_imagenet21k 224 - - 85798656
vit_base_patch_16_384_imagenet 384 - - 86090496
vit_base_patch32_224_imagenet21k 224 - - 87455232
vit_base_patch32_384_imagenet 384 - - 87528192
Large
vit_large_patch16_224_imagenet 224 - - 303301632
vit_large_patch16_224_imagenet21k 224 - - 303301632
vit_large_patch16_384_imagenet 224 - - 303690752
vit_large_patch32_224_imagenet21k 224 - - 305510400
vit_large_patch32_384_imagenet 224 - - 305607680
Huge
vit_huge_patch14_224_imagenet21k 224 - - 630764800

Example Usage

Pretrained ViT model

image_classifier = keras_hub.models.ImageClassification.from_preset(
    "vit_large_patch16_224_imagenet21k"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)

Load the backbone weights and fine-tune model for custom dataset.

backbone = keras_hub.models.Backbone.from_preset(
    "vit_large_patch16_224_imagenet21k"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
    "vit_large_patch16_224_imagenet21k"
)
model = keras_hub.models.ViTImageClassifier(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor,
)

Example Usage with Hugging Face URI

Pretrained ViT model

image_classifier = keras_hub.models.ImageClassification.from_preset(
    "hf://keras/vit_large_patch16_224_imagenet21k"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)

Load the backbone weights and fine-tune model for custom dataset.

backbone = keras_hub.models.Backbone.from_preset(
    "hf://keras/vit_large_patch16_224_imagenet21k"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
    "hf://keras/vit_large_patch16_224_imagenet21k"
)
model = keras_hub.models.ViTImageClassifier(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor,
)
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.