Experimental Granite 97M Multilingual R2 Core ML ANE Packages
Experimental Status
These artifacts are personal experimental Core ML conversions of
ibm-granite/granite-embedding-97m-multilingual-r2.
They are intended for macOS Core ML / Apple Neural Engine experimentation and
should be treated as research/engineering artifacts rather than an official
model release.
For the official model description, supported languages, intended use, training details, license, and limitations, refer to the original IBM Granite model card: https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2.
What This Repository Contains
This repository contains fixed-shape Apple Core ML .mlpackage variants derived
from the original Granite Embedding 97M Multilingual R2 model:
| Path | Shape | Suggested use |
|---|---|---|
coreml/ane-b1-s512-macos13/granite-ane-s512-macos13.mlpackage |
batch 1, sequence 512 | Recommended default fixed-shape package |
coreml/ane-b1-s128-macos13/granite-ane-s128-macos13.mlpackage |
batch 1, sequence 128 | Low-latency short-text companion |
experimental/coreml/ane-b1-s1024-macos13/granite-ane-s1024-macos13.mlpackage |
batch 1, sequence 1024 | Experimental long-context candidate |
The packages are conversion artifacts only. Consumers still need the original Granite tokenizer and must preserve the same pooling and normalization semantics used by the consuming application.
Evaluation and Validation Snapshot
The table below mirrors the coverage of the official IBM Granite model card's
evaluation section, but separates upstream model-quality scores from local Core
ML conversion evidence. Not run means this Core ML package has not yet been
evaluated on that benchmark family.
| Metric from upstream evaluation table | Official Granite 97M R2 reference | Local Core ML s512 evidence |
Status |
|---|---|---|---|
| Multilingual MTEB Retrieval (18) | 60.3 |
57.73 |
Run locally; fixed-shape Core ML harness |
| MTEB Retrieval (eng, v2) (10) | 50.1 |
Not run | Not evaluated for this artifact |
| MTEB Code (v1) (12) | 60.4 |
Not run | Not evaluated for this artifact |
| LongEmbed (6) | 65.5 |
Not run | Not evaluated for this artifact |
| RaR-b (17) | 24.9 |
Not run | Not evaluated for this artifact |
| AVG | 52.2 |
Not computed | Requires the same full benchmark set |
| H100 throughput (docs/s) | 2,534 |
122.2 512-token windows/s |
Local Apple Silicon pure encode, 3-run short mean |
The local 57.73 value is the simple unweighted task-level mean from this
fixed-shape Core ML run multiplied by 100. It is useful as a caveated comparison
point for this Core ML artifact, but it is not a recipe-equivalent reproduction
of the official 60.3 model-card result.
For throughput context, IBM reports 2,534 docs/s on a single H100 using
512-token chunks. On a local Apple M4 Mac mini, the recommended sequence-512
Core ML ANE package measured 121.1 512-token windows/s in the 1024-chunk
release run. A later targeted quiet check using three 256-chunk repeats measured
a mean of 122.2 512-token windows/s, with a range of 121.5 to 123.6. This
is roughly 4.8% of the H100 throughput, or about 21x slower, while running
locally on Apple Silicon without a datacenter GPU.
Core ML Artifact Validation
| Artifact | Shape | Status | CPU parity | Placement evidence | Pure encode throughput |
|---|---|---|---|---|---|
granite-ane-s128-macos13.mlpackage |
batch 1, sequence 128 | Optional companion | Pass | cpu-and-ne, 3,031 Neural Engine-preferred ops |
146.3 512-token windows/s |
granite-ane-s512-macos13.mlpackage |
batch 1, sequence 512 | Recommended default | Pass | cpu-and-ne, 3,209 Neural Engine-preferred ops |
122.2 512-token windows/s short mean |
granite-ane-s1024-macos13.mlpackage |
batch 1, sequence 1024 | Experimental | Pass | cpu-and-ne, 3,209 Neural Engine-preferred ops |
54.8 512-token windows/s |
Core ML / Hugging Face Parity
The sequence-512 package was also checked against the original Hugging Face model on a small local ranking fixture:
| Check | Result |
|---|---|
| Mean embedding cosine | 0.9999898 |
| Minimum embedding cosine | 0.9999838 |
| Maximum absolute embedding delta | 0.0008967 |
| Mean top-k ranking overlap | 1.0 |
| Mean nDCG delta | 0.0 |
Local Multilingual Retrieval Details
Full local Core ML run:
| Field | Value |
|---|---|
| Artifact | granite-ane-s512-macos13.mlpackage |
| Backend | Core ML |
| Benchmark | MTEB(Multilingual, v2) |
| MTEB version | 2.14.5 |
| Tasks completed | 18/18 |
| MTEB exceptions | 0 |
| MIRACL hard-negative subsets | 18/18 |
| Sequence policy | fixed sequence length 512 with tokenizer padding/truncation |
| Pooling and normalization | CLS pooling, then L2 normalization |
Per-task local main scores:
| Task | Subsets | Main score |
|---|---|---|
| StackOverflowQA | 1 | 0.81613 |
| TwitterHjerneRetrieval | 1 | 0.56869 |
| AILAStatutes | 1 | 0.27827 |
| ArguAna | 1 | 0.50822 |
| HagridRetrieval | 1 | 0.98694 |
| LegalBenchCorporateLobbying | 1 | 0.91474 |
| LEMBPasskeyRetrieval | 8 | 0.38500 |
| SCIDOCS | 1 | 0.20173 |
| SpartQA | 1 | 0.67397 |
| TempReasonL1 | 1 | 0.05192 |
| TRECCOVID | 1 | 0.68176 |
| WinoGrande | 1 | 0.56795 |
| BelebeleRetrieval | 376 | 0.52829 |
| MLQARetrieval | 98 | 0.60411 |
| StatcanDialogueDatasetRetrieval | 4 | 0.55207 |
| WikipediaRetrievalMultilingual | 16 | 0.83237 |
| CovidRetrieval | 1 | 0.68499 |
| MIRACLRetrievalHardNegatives | 18 | 0.55417 |
These measurements are local validation results for the conversion artifacts, not official benchmark claims. Validate placement and throughput in the target application process before using the packages for performance comparisons.
License and Attribution
The original model card lists the base model license as Apache 2.0. This conversion repository follows that license metadata and attributes the base model to IBM Granite. For the official model, documentation, and limitations, refer to the original IBM Granite model card.
- Downloads last month
- 9