---
license: apache-2.0
inference: false
pipeline_tag: image-classification
datasets:
- imagenet-1k
---

# Perceiver IO image classifier

This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent
to the [deepmind/vision-perceiver-fourier](https://huggingface.co/deepmind/vision-perceiver-fourier) model but based on 
implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from 
the `deepmind/vision-perceiver-fourier` model with a library-specific [conversion utility](#model-conversion). Both 
models generate equal output for the same input. 

Content of the `deepmind/vision-perceiver-fourier` [model card](https://huggingface.co/deepmind/vision-perceiver-fourier)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.

## Model description

The model is specif in Appendix A of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (2D Fourier features).

## Intended use and limitations

The model can be used for image classification.

## Usage examples

To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) 
the `perceiver-io` library with extension `text`.

```shell
pip install perceiver-io[text]
```

Then the model can be used with PyTorch. Either use the model and image processor directly

```python
    import requests
    from PIL import Image
    from transformers import AutoModelForImageClassification, AutoImageProcessor
    from perceiver.model.vision import image_classifier  # auto-class registration

    repo_id = "krasserm/perceiver-io-img-clf"

    # An image of a baseball player from MS-COCO validation set
    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
    image = Image.open(requests.get(url, stream=True).raw)

    model = AutoModelForImageClassification.from_pretrained(repo_id)
    processor = AutoImageProcessor.from_pretrained(repo_id)

    processed = processor(image, return_tensors="pt")
    prediction = model(**processed).logits.argmax(dim=-1)

    print(f"Predicted class = {model.config.id2label[prediction.item()]}")
```
```
Predicted class = ballplayer, baseball player
```

or use an `image-classification` pipeline:

```python
    import requests
    from PIL import Image
    from transformers import pipeline
    from perceiver.model.vision import image_classifier  # auto-class registration

    repo_id = "krasserm/perceiver-io-img-clf"

    # An image of a baseball player from MS-COCO validation set
    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
    image = Image.open(requests.get(url, stream=True).raw)

    classifier = pipeline("image-classification", model=repo_id)
    prediction = classifier(image)

    print(f"Predicted class = {prediction[0]['label']}")
```
```
Predicted class = ballplayer, baseball player
```

## Model conversion

The `krasserm/perceiver-io-img-clf` model has been created from the source `deepmind/vision-perceiver-fourier` model 
with: 

```python
from perceiver.model.vision.image_classifier import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-img-clf",
    source_repo_id="deepmind/vision-perceiver-fourier",
    push_to_hub=True,
)
```

## Citation

```bibtex
@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}
```