Perceiver IO image classifier
This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent
to the deepmind/vision-perceiver-fourier model but based on
implementation classes of the perceiver-io library. It can be created from
the deepmind/vision-perceiver-fourier
model with a library-specific conversion utility. Both
models generate equal output for the same input.
Content of the deepmind/vision-perceiver-fourier
model card
also applies to this model except usage examples. Refer to the linked card for further model and
training details.
Model description
The model is specif in Appendix A of the Perceiver IO paper (2D Fourier features).
Intended use and limitations
The model can be used for image classification.
Usage examples
To use this model you first need to install
the perceiver-io
library with extension text
.
pip install perceiver-io[text]
Then the model can be used with PyTorch. Either use the model and image processor directly
import requests
from PIL import Image
from transformers import AutoModelForImageClassification, AutoImageProcessor
from perceiver.model.vision import image_classifier # auto-class registration
repo_id = "krasserm/perceiver-io-img-clf"
# An image of a baseball player from MS-COCO validation set
url = "http://images.cocodataset.org/val2017/000000507223.jpg"
image = Image.open(requests.get(url, stream=True).raw)
model = AutoModelForImageClassification.from_pretrained(repo_id)
processor = AutoImageProcessor.from_pretrained(repo_id)
processed = processor(image, return_tensors="pt")
prediction = model(**processed).logits.argmax(dim=-1)
print(f"Predicted class = {model.config.id2label[prediction.item()]}")
Predicted class = ballplayer, baseball player
or use an image-classification
pipeline:
import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import image_classifier # auto-class registration
repo_id = "krasserm/perceiver-io-img-clf"
# An image of a baseball player from MS-COCO validation set
url = "http://images.cocodataset.org/val2017/000000507223.jpg"
image = Image.open(requests.get(url, stream=True).raw)
classifier = pipeline("image-classification", model=repo_id)
prediction = classifier(image)
print(f"Predicted class = {prediction[0]['label']}")
Predicted class = ballplayer, baseball player
Model conversion
The krasserm/perceiver-io-img-clf
model has been created from the source deepmind/vision-perceiver-fourier
model
with:
from perceiver.model.vision.image_classifier import convert_model
convert_model(
save_dir="krasserm/perceiver-io-img-clf",
source_repo_id="deepmind/vision-perceiver-fourier",
push_to_hub=True,
)
Citation
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
- Downloads last month
- 10