AIM: Autoregressive Image Models

Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, and Armand Joulin

This software project accompanies the research paper, Scalable Pre-training of Large Autoregressive Image Models.

We introduce AIM a collection of vision models pre-trained with an autoregressive generative objective. We show that autoregressive pre-training of image features exhibits similar scaling properties to their textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:

  1. the model capacity can be trivially scaled to billions of parameters, and
  2. AIM effectively leverages large collections of uncurated image data.


Please install PyTorch using the official installation instructions. Afterward, install the package as:

pip install git+


Below we provide an example of loading the model via HuggingFace Hub as:

from PIL import Image

from aim.torch.models import AIMForImageClassification
from import val_transforms

img =
model = AIMForImageClassification.from_pretrained("apple/aim-600M")
transform = val_transforms()

inp = transform(img).unsqueeze(0)
logits, features = model(inp)

ImageNet-1k results (frozen trunk)

The table below contains the classification results on ImageNet-1k validation set.

model top-1 IN-1k
last layer best layer
AIM-0.6B 78.5% 79.4%
AIM-1B 80.6% 82.3%
AIM-3B 82.2% 83.3%
AIM-7B 82.4% 84.0%
