Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -7,6 +7,7 @@ library_name: mobileclip
|
|
| 7 |
|
| 8 |
MobileCLIP2 was introduced in [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691) (TMLR August 2025 <mark>Featured</mark>), by Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Alexander T Toshev, Oncel Tuzel, Hadi Pouransari.
|
| 9 |
|
|
|
|
| 10 |
This repository contains the **MobileCLIP-S4** checkpoint.
|
| 11 |
|
| 12 |

|
|
@@ -24,7 +25,7 @@ This repository contains the **MobileCLIP-S4** checkpoint.
|
|
| 24 |
|
| 25 |
| Model | # Seen <BR>Samples (B) | # Params (M) <BR> (img + txt) | Latency (ms) <BR> (img + txt) | IN-1k Zero-Shot <BR> Top-1 Acc. (%) | Avg. Perf. (%) <BR> on 38 datasets |
|
| 26 |
|:----------------------------------------------------------|:----------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------------:|:----------------------------------:|
|
| 27 |
-
| [MobileCLIP2-S0](https://hf.co/apple/MobileCLIP2-S0) | 13 | 11.4 +
|
| 28 |
| [MobileCLIP2-S2](https://hf.co/apple/MobileCLIP2-S2) | 13 | 35.7 + 63.4 | 3.6 + 3.3 | 77.2 | 64.1 |
|
| 29 |
| [MobileCLIP2-B](https://hf.co/apple/MobileCLIP2-B) | 13 | 86.3 + 63.4 | 10.4 + 3.3 | 79.4 | 65.8 |
|
| 30 |
| [MobileCLIP2-S3](https://hf.co/apple/MobileCLIP2-S3) | 13 | 125.1 + 123.6 | 8.0 + 6.6 | 80.7 | 66.8 |
|
|
@@ -61,8 +62,11 @@ from mobileclip.modules.common.mobileone import reparameterize_model
|
|
| 61 |
model, _, preprocess = open_clip.create_model_and_transforms('MobileCLIP2-S4', pretrained='/path/to/mobileclip_s4.pt')
|
| 62 |
tokenizer = open_clip.get_tokenizer('MobileCLIP2-S4')
|
| 63 |
|
|
|
|
|
|
|
|
|
|
| 64 |
# For inference/model exporting purposes, please reparameterize first
|
| 65 |
-
model = reparameterize_model(model
|
| 66 |
|
| 67 |
image = preprocess(Image.open("docs/fig_accuracy_latency.png").convert('RGB')).unsqueeze(0)
|
| 68 |
text = tokenizer(["a diagram", "a dog", "a cat"])
|
|
|
|
| 7 |
|
| 8 |
MobileCLIP2 was introduced in [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691) (TMLR August 2025 <mark>Featured</mark>), by Fartash Faghri, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Alexander T Toshev, Oncel Tuzel, Hadi Pouransari.
|
| 9 |
|
| 10 |
+
|
| 11 |
This repository contains the **MobileCLIP-S4** checkpoint.
|
| 12 |
|
| 13 |

|
|
|
|
| 25 |
|
| 26 |
| Model | # Seen <BR>Samples (B) | # Params (M) <BR> (img + txt) | Latency (ms) <BR> (img + txt) | IN-1k Zero-Shot <BR> Top-1 Acc. (%) | Avg. Perf. (%) <BR> on 38 datasets |
|
| 27 |
|:----------------------------------------------------------|:----------------------:|:-----------------------------:|:-----------------------------:|:-----------------------------------:|:----------------------------------:|
|
| 28 |
+
| [MobileCLIP2-S0](https://hf.co/apple/MobileCLIP2-S0) | 13 | 11.4 + 63.4 | 1.5 + 3.3 | 71.5 | 59.7 |
|
| 29 |
| [MobileCLIP2-S2](https://hf.co/apple/MobileCLIP2-S2) | 13 | 35.7 + 63.4 | 3.6 + 3.3 | 77.2 | 64.1 |
|
| 30 |
| [MobileCLIP2-B](https://hf.co/apple/MobileCLIP2-B) | 13 | 86.3 + 63.4 | 10.4 + 3.3 | 79.4 | 65.8 |
|
| 31 |
| [MobileCLIP2-S3](https://hf.co/apple/MobileCLIP2-S3) | 13 | 125.1 + 123.6 | 8.0 + 6.6 | 80.7 | 66.8 |
|
|
|
|
| 62 |
model, _, preprocess = open_clip.create_model_and_transforms('MobileCLIP2-S4', pretrained='/path/to/mobileclip_s4.pt')
|
| 63 |
tokenizer = open_clip.get_tokenizer('MobileCLIP2-S4')
|
| 64 |
|
| 65 |
+
# Model needs to be in eval mode for inference because of batchnorm layers unlike ViTs
|
| 66 |
+
model.eval()
|
| 67 |
+
|
| 68 |
# For inference/model exporting purposes, please reparameterize first
|
| 69 |
+
model = reparameterize_model(model)
|
| 70 |
|
| 71 |
image = preprocess(Image.open("docs/fig_accuracy_latency.png").convert('RGB')).unsqueeze(0)
|
| 72 |
text = tokenizer(["a diagram", "a dog", "a cat"])
|