--- license: apache-2.0 inference: false pipeline_tag: image-classification datasets: - imagenet-1k --- # Perceiver IO image classifier This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent to the [deepmind/vision-perceiver-fourier](https://huggingface.co/deepmind/vision-perceiver-fourier) model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from the `deepmind/vision-perceiver-fourier` model with a library-specific [conversion utility](#model-conversion). Both models generate equal output for the same input. Content of the `deepmind/vision-perceiver-fourier` [model card](https://huggingface.co/deepmind/vision-perceiver-fourier) also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and training details. ## Model description The model is specif in Appendix A of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (2D Fourier features). ## Intended use and limitations The model can be used for image classification. ## Usage examples To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) the `perceiver-io` library with extension `text`. ```shell pip install perceiver-io[text] ``` Then the model can be used with PyTorch. Either use the model and image processor directly ```python import requests from PIL import Image from transformers import AutoModelForImageClassification, AutoImageProcessor from perceiver.model.vision import image_classifier # auto-class registration repo_id = "krasserm/perceiver-io-img-clf" # An image of a baseball player from MS-COCO validation set url = "http://images.cocodataset.org/val2017/000000507223.jpg" image = Image.open(requests.get(url, stream=True).raw) model = AutoModelForImageClassification.from_pretrained(repo_id) processor = AutoImageProcessor.from_pretrained(repo_id) processed = processor(image, return_tensors="pt") prediction = model(**processed).logits.argmax(dim=-1) print(f"Predicted class = {model.config.id2label[prediction.item()]}") ``` ``` Predicted class = ballplayer, baseball player ``` or use an `image-classification` pipeline: ```python import requests from PIL import Image from transformers import pipeline from perceiver.model.vision import image_classifier # auto-class registration repo_id = "krasserm/perceiver-io-img-clf" # An image of a baseball player from MS-COCO validation set url = "http://images.cocodataset.org/val2017/000000507223.jpg" image = Image.open(requests.get(url, stream=True).raw) classifier = pipeline("image-classification", model=repo_id) prediction = classifier(image) print(f"Predicted class = {prediction[0]['label']}") ``` ``` Predicted class = ballplayer, baseball player ``` ## Model conversion The `krasserm/perceiver-io-img-clf` model has been created from the source `deepmind/vision-perceiver-fourier` model with: ```python from perceiver.model.vision.image_classifier import convert_model convert_model( save_dir="krasserm/perceiver-io-img-clf", source_repo_id="deepmind/vision-perceiver-fourier", push_to_hub=True, ) ``` ## Citation ```bibtex @article{jaegle2021perceiver, title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, journal={arXiv preprint arXiv:2107.14795}, year={2021} } ```