Perceiver IO image classifier

This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent to the deepmind/vision-perceiver-fourier model but based on implementation classes of the perceiver-io library. It can be created from the deepmind/vision-perceiver-fourier model with a library-specific conversion utility. Both models generate equal output for the same input.

Content of the deepmind/vision-perceiver-fourier model card also applies to this model except usage examples. Refer to the linked card for further model and training details.

Model description

The model is specif in Appendix A of the Perceiver IO paper (2D Fourier features).

Intended use and limitations

The model can be used for image classification.

Usage examples

To use this model you first need to install the perceiver-io library with extension text.

pip install perceiver-io[text]

Then the model can be used with PyTorch. Either use the model and image processor directly

    import requests
    from PIL import Image
    from transformers import AutoModelForImageClassification, AutoImageProcessor
    from perceiver.model.vision import image_classifier  # auto-class registration

    repo_id = "krasserm/perceiver-io-img-clf"

    # An image of a baseball player from MS-COCO validation set
    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
    image = Image.open(requests.get(url, stream=True).raw)

    model = AutoModelForImageClassification.from_pretrained(repo_id)
    processor = AutoImageProcessor.from_pretrained(repo_id)

    processed = processor(image, return_tensors="pt")
    prediction = model(**processed).logits.argmax(dim=-1)

    print(f"Predicted class = {model.config.id2label[prediction.item()]}")

Predicted class = ballplayer, baseball player

or use an image-classification pipeline:

    import requests
    from PIL import Image
    from transformers import pipeline
    from perceiver.model.vision import image_classifier  # auto-class registration

    repo_id = "krasserm/perceiver-io-img-clf"

    # An image of a baseball player from MS-COCO validation set
    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
    image = Image.open(requests.get(url, stream=True).raw)

    classifier = pipeline("image-classification", model=repo_id)
    prediction = classifier(image)

    print(f"Predicted class = {prediction[0]['label']}")

Predicted class = ballplayer, baseball player

Model conversion

The krasserm/perceiver-io-img-clf model has been created from the source deepmind/vision-perceiver-fourier model with:

from perceiver.model.vision.image_classifier import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-img-clf",
    source_repo_id="deepmind/vision-perceiver-fourier",
    push_to_hub=True,
)

Citation

@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}

krasserm
/

perceiver-io-img-clf

Perceiver IO image classifier

Model description

Intended use and limitations

Usage examples

Model conversion

Citation

Dataset used to train krasserm/perceiver-io-img-clf

Collection including krasserm/perceiver-io-img-clf

perceiver