JaxNN conversion of the timm vit_base_patch32_224.sam_in1k Vision Transformer checkpoint.

Model Details

  • Architecture: vit_base_patch32_224
  • Source: timm/vit_base_patch32_224.sam_in1k

Model card for vit_base_patch32_224.sam_in1k

A Vision Transformer (ViT) image classification model. Trained on ImageNet-1k using Sharpness Aware Minimization.

Model Details

Model Usage

Image Classification

from urllib.request import urlopen

import jax
from PIL import Image

import jaxnn

img = Image.open(urlopen(
    "https://huggingface.co/datasets/huggingface/cats-image/resolve/main/cats_image.jpeg"
))

model = jaxnn.create_model("vit_base_patch32_224.sam_in1k", pretrained=True)
model.eval()

data_config = jaxnn.data.resolve_model_data_config(model)
transforms = jaxnn.data.create_transform(**data_config, is_training=False)

x = jax.numpy.expand_dims(transforms(img), 0)
output = model(x, deterministic=True)

top5_probabilities, top5_class_indices = jax.lax.top_k(
    jax.nn.softmax(output, axis=-1) * 100,
    k=5,
)

Image Embeddings

from urllib.request import urlopen

import jax
from PIL import Image

import jaxnn

img = Image.open(urlopen(
    "https://huggingface.co/datasets/huggingface/cats-image/resolve/main/cats_image.jpeg"
))

model = jaxnn.create_model(
    "vit_base_patch32_224.sam_in1k",
    pretrained=True,
    num_classes=0,
)
model.eval()

data_config = jaxnn.data.resolve_model_data_config(model)
transforms = jaxnn.data.create_transform(**data_config, is_training=False)

x = jax.numpy.expand_dims(transforms(img), 0)
output = model(x, deterministic=True)

Citation

@article{chen2021vision,
  title={When vision transformers outperform resnets without pre-training or strong data augmentations},
  author={Chen, Xiangning and Hsieh, Cho-Jui and Gong, Boqing},
  journal={arXiv preprint arXiv:2106.01548},
  year={2021}
}
@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}
@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for JaxNN/vit_base_patch32_224.sam_in1k