---
license: apple-ascl
---

## MobileCLIP CoreML Models

These are the CoreML models of MobileCLIP. For more details, refer to [MobileCLIP on HuggingFace](https://huggingface.co/apple/mobileclip_b_timm) and [MobileCLIP on GitHub](https://github.com/apple/ml-mobileclip).

The models are separated for each subarchitecture:

- **MobileCLIP-S0**: This subarchitecture is designed for lightweight and fast inference, making it suitable for edge devices with limited computational resources.
- **MobileCLIP-S1**: This subarchitecture offers a balance between model complexity and performance, providing a good trade-off for various applications.
- **MobileCLIP-S2**: This subarchitecture focuses on achieving higher accuracy, ideal for applications where performance can be slightly compromised for better results.
- **MobileCLIP-B**: This subarchitecture aims at delivering the highest possible accuracy, optimized for environments with ample computational resources.

Each subarchitecture contains a TextEncoder and ImageEncoder that are separated into CoreML models for each subarchitecture:

| Model                                                     | CLIP Text                | CLIP Image                  |
|:----------------------------------------------------------|:-------------------------|:----------------------------|
| MobileCLIP-S0                                             | clip_text_s0.mlpackage   | clip_image_s0.mlpackage     |
| MobileCLIP-S1                                             | clip_text_s1.mlpackage   | clip_image_s1.mlpackage     |
| MobileCLIP-S2                                             | clip_text_s2.mlpackage   | clip_image_s2.mlpackage     |
| MobileCLIP-B                                              | clip_text_B.mlpackage    | clip_image_B.mlpackage      |

For detailed implementation and architecture specifics, refer to the [MobileCLIP GitHub repository](https://github.com/apple/ml-mobileclip).

**CoreML Parameters:**


| Model    | Input Name   | Input Shape | Input DataType | Output Name        | Output Shape | Output DataType |
|:---------|:-------------|:------------|:---------------|:-------------------|:-------------|:----------------|
| CLIP Text| input_text   | (1,77)      | INT32          | output_embeddings  | (1,512)      | FLOAT16         |

| Model    | Input Name   | Input Width | Input Height | Input ColorSpace | Output Name        | Output Shape | Output DataType |
|:---------|:-------------|:------------|:-------------|:-----------------|:-------------------|:-------------|:----------------|
| CLIP Image| input_image | 256         | 256          | RGB              | output_embeddings  | (1,512)      | FLOAT16         |


*These are example scripts for performing the conversion to CoreML*

1. **CLIPImageModel to CoreML** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZHMzsJyAukBa4Jryv4Tmc_BOBmbQAjxf?usp=sharing)
   - This notebook demonstrates the process of converting a CLIP image model to CoreML format.

2. **CLIPTextModel to CoreML**  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1PxzB8M0h2bf-uYpw7fIZImpGSVXUI7Ie?usp=sharing)
   - This notebook demonstrates the process of converting a CLIP text model to CoreML format.