apple
/

AIM-600M

Image Classification

PyTorch

ml-aim

Model card Files Files and versions Community

alaaelnouby commited on Jan 19, 2024

Commit

5ac297a

•

1 Parent(s): 12524c5

Update README.md

Browse files

Files changed (1) hide show

README.md +80 -0

README.md CHANGED Viewed

@@ -2,4 +2,84 @@
 license: other
 license_name: apple-sample-code-license
 license_link: LICENSE
 ---

 license: other
 license_name: apple-sample-code-license
 license_link: LICENSE
+library_name: ml-aim
+pipeline_tag: image-classification
 ---
+# AIM: Autoregressive Image Models
+*Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar,
+Joshua M Susskind, and Armand Joulin*
+This software project accompanies the research paper, [Scalable Pre-training of Large Autoregressive Image Models](https://arxiv.org/abs/2401.08541).
+We introduce **AIM** a collection of vision models pre-trained with an autoregressive generative objective.
+We show that autoregressive pre-training of image features exhibits similar scaling properties to their
+textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:
+1. the model capacity can be trivially scaled to billions of parameters, and
+2. AIM effectively leverages large collections of uncurated image data.
+## Installation
+Please install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/).
+Afterward, install the package as:
+```commandline
+pip install git+https://git@github.com/apple/ml-aim.git
+```
+## Usage
+Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as:
+```python
+from PIL import Image
+from aim.torch.models import AIMForImageClassification
+from aim.torch.data import val_transforms
+img = Image.open(...)
+model = AIMForImageClassification.from_pretrained("apple/aim-600M")
+transform = val_transforms()
+inp = transform(img).unsqueeze(0)
+logits, features = model(inp)
+```
+### ImageNet-1k results (frozen trunk)
+The table below contains the classification results on ImageNet-1k validation set.
+<table style="margin: auto">
+  <thead>
+    <tr>
+      <th rowspan="2">model</th>
+      <th colspan="2">top-1 IN-1k</th>
+    </tr>
+    <tr>
+      <th>last layer</th>
+      <th>best layer</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>AIM-0.6B</td>
+      <td>78.5%</td>
+      <td>79.4%</td>
+    </tr>
+    <tr>
+      <td>AIM-1B</td>
+      <td>80.6%</td>
+      <td>82.3%</td>
+    </tr>
+    <tr>
+      <td>AIM-3B</td>
+      <td>82.2%</td>
+      <td>83.3%</td>
+    </tr>
+    <tr>
+      <td>AIM-7B</td>
+      <td>82.4%</td>
+      <td>84.0%</td>
+    </tr>
+  </tbody>
+</table>