cs-giung
/

vit-tiny-patch16-imagenet21k-augreg

Image Classification

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Vision Transformer

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale and further enhanced in the follow-up paper How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers. The weights were converted from the Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0.npz file in GCS buckets presented in the original repository.

Downloads last month: 4

Safetensors

Model size

9.74M params

Tensor type

F32

·

Inference API

Image Classification

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cs-giung/vit-tiny-patch16-imagenet21k-augreg

ViT (ImageNet-21k)

4 items • Updated 24 days ago