MobileCLIP CoreML Models
The models described here correspond to the CoreML conversion of the original MobileCLIP models from Apple. For more details, refer to MobileCLIP on HuggingFace and MobileCLIP on GitHub.
The models are separated for each subarchitecture:
- MobileCLIP-S0: This subarchitecture is designed for lightweight and fast inference, making it suitable for edge devices with limited computational resources.
- MobileCLIP-S1: This subarchitecture offers a balance between model complexity and performance, providing a good trade-off for various applications.
- MobileCLIP-S2: This subarchitecture focuses on achieving higher accuracy, ideal for applications where performance can be slightly compromised for better results.
- MobileCLIP-B: This subarchitecture aims at delivering the highest possible accuracy, optimized for environments with ample computational resources.
Each subarchitecture contains a TextEncoder and ImageEncoder that are separated into CoreML models for each subarchitecture:
Model | CLIP Text | CLIP Image |
---|---|---|
MobileCLIP-S0 | clip_text_s0.mlpackage | clip_image_s0.mlpackage |
MobileCLIP-S1 | clip_text_s1.mlpackage | clip_image_s1.mlpackage |
MobileCLIP-S2 | clip_text_s2.mlpackage | clip_image_s2.mlpackage |
MobileCLIP-B | clip_text_B.mlpackage | clip_image_B.mlpackage |
For detailed implementation and architecture specifics, refer to the MobileCLIP GitHub repository.
Example Usage
An example of using these CoreML models in a Swift application for iOS can be found in the CLIP-Finder project.
CoreML Parameters:
Model | Input Name | Input Shape | Input DataType | Output Name | Output Shape | Output DataType |
---|---|---|---|---|---|---|
CLIP Text | input_text | (1,77) | INT32 | output_embeddings | (1,512) | FLOAT16 |
Model | Input Name | Input Width | Input Height | Input ColorSpace | Output Name | Output Shape | Output DataType |
---|---|---|---|---|---|---|---|
CLIP Image | input_image | 256 | 256 | RGB | output_embeddings | (1,512) | FLOAT16 |
CoreML Profile (Benchmark) on Apple M1
Prediction Times Apple M1 | CPU + ANE | CPU + GPU | CPU Only |
---|---|---|---|
clip_image_s0 | 1.4ms | 7.4ms | 12.7ms |
clip_image_s1 | 2.1ms | 13.3ms | 21.8ms |
clip_image_s2 | 3.0ms | 19.0ms | 28.5ms |
clip_image_b | 12.4ms | 36.2ms | 38.1ms |
clip_text_s0 | 1.1ms | 4.1ms | 4.8ms |
clip_text_s1 | 2.0ms | 7.1ms | 9.5ms |
clip_text_s2 | 2.0ms | 7.1ms | 10ms |
clip_text_b | 2.0ms | 7.2ms | 9.8ms |
The profile was conducted using this tool: CoreMLProfiler.
These are example scripts for performing the conversion to CoreML
- Downloads last month
- 5