apple
/

MobileCLIP-S4

mobileclip

Model card Files Files and versions

xet

Community

fartashf commited on 20 days ago

Commit

a3468db

verified ·

1 Parent(s): 61176e3

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ library_name: mobileclip
 MobileCLIP2 was introduced in [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691) (TMLR August 2025 <mark>Featured</mark>), by Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Alexander T Toshev, Oncel Tuzel, Hadi Pouransari.
 This repository contains the **MobileCLIP-S4** checkpoint.
 ![MobileCLIP2 Performance Figure](fig_accuracy_latency_v2.png)
@@ -24,7 +25,7 @@ This repository contains the **MobileCLIP-S4** checkpoint.
 | Model                                                     | # Seen <BR>Samples (B) | # Params (M) <BR> (img + txt) | Latency (ms) <BR> (img + txt) | IN-1k Zero-Shot <BR> Top-1 Acc. (%) | Avg. Perf. (%) <BR> on 38 datasets |
 |:----------------------------------------------------------|:----------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------------:|:----------------------------------:|
-| [MobileCLIP2-S0](https://hf.co/apple/MobileCLIP2-S0)      |           13           |          11.4 + 42.4          |           1.5 + 1.6           |               71.5                 |                59.7                 |
 | [MobileCLIP2-S2](https://hf.co/apple/MobileCLIP2-S2)      |           13           |          35.7 + 63.4          |           3.6 + 3.3           |               77.2                 |                64.1                 |
 | [MobileCLIP2-B](https://hf.co/apple/MobileCLIP2-B)        |           13           |          86.3 + 63.4          |          10.4 + 3.3           |               79.4                 |                65.8                 |
 | [MobileCLIP2-S3](https://hf.co/apple/MobileCLIP2-S3)      |           13           |         125.1 + 123.6         |           8.0 + 6.6           |               80.7                 |                66.8                 |
@@ -61,8 +62,11 @@ from mobileclip.modules.common.mobileone import reparameterize_model
 model, _, preprocess = open_clip.create_model_and_transforms('MobileCLIP2-S4', pretrained='/path/to/mobileclip_s4.pt')
 tokenizer = open_clip.get_tokenizer('MobileCLIP2-S4')
 # For inference/model exporting purposes, please reparameterize first
-model = reparameterize_model(model.eval())
 image = preprocess(Image.open("docs/fig_accuracy_latency.png").convert('RGB')).unsqueeze(0)
 text = tokenizer(["a diagram", "a dog", "a cat"])

 MobileCLIP2 was introduced in [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691) (TMLR August 2025 <mark>Featured</mark>), by Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Alexander T Toshev, Oncel Tuzel, Hadi Pouransari.
 This repository contains the **MobileCLIP-S4** checkpoint.
 ![MobileCLIP2 Performance Figure](fig_accuracy_latency_v2.png)
 | Model                                                     | # Seen <BR>Samples (B) | # Params (M) <BR> (img + txt) | Latency (ms) <BR> (img + txt) | IN-1k Zero-Shot <BR> Top-1 Acc. (%) | Avg. Perf. (%) <BR> on 38 datasets |
 |:----------------------------------------------------------|:----------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------------:|:----------------------------------:|
+| [MobileCLIP2-S0](https://hf.co/apple/MobileCLIP2-S0)      |           13           |          11.4 + 63.4          |           1.5 + 3.3           |               71.5                 |                59.7                 |
 | [MobileCLIP2-S2](https://hf.co/apple/MobileCLIP2-S2)      |           13           |          35.7 + 63.4          |           3.6 + 3.3           |               77.2                 |                64.1                 |
 | [MobileCLIP2-B](https://hf.co/apple/MobileCLIP2-B)        |           13           |          86.3 + 63.4          |          10.4 + 3.3           |               79.4                 |                65.8                 |
 | [MobileCLIP2-S3](https://hf.co/apple/MobileCLIP2-S3)      |           13           |         125.1 + 123.6         |           8.0 + 6.6           |               80.7                 |                66.8                 |
 model, _, preprocess = open_clip.create_model_and_transforms('MobileCLIP2-S4', pretrained='/path/to/mobileclip_s4.pt')
 tokenizer = open_clip.get_tokenizer('MobileCLIP2-S4')
+# Model needs to be in eval mode for inference because of batchnorm layers unlike ViTs
+model.eval()
 # For inference/model exporting purposes, please reparameterize first
+model = reparameterize_model(model)
 image = preprocess(Image.open("docs/fig_accuracy_latency.png").convert('RGB')).unsqueeze(0)
 text = tokenizer(["a diagram", "a dog", "a cat"])