Experimental Granite 97M Multilingual R2 Core ML ANE Packages

Experimental Status

These artifacts are personal experimental Core ML conversions of ibm-granite/granite-embedding-97m-multilingual-r2. They are intended for macOS Core ML / Apple Neural Engine experimentation and should be treated as research/engineering artifacts rather than an official model release.

For the official model description, supported languages, intended use, training details, license, and limitations, refer to the original IBM Granite model card: https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2.

What This Repository Contains

This repository contains fixed-shape Apple Core ML .mlpackage variants derived from the original Granite Embedding 97M Multilingual R2 model:

Path Shape Suggested use
coreml/ane-b1-s512-macos13/granite-ane-s512-macos13.mlpackage batch 1, sequence 512 Recommended default fixed-shape package
coreml/ane-b1-s128-macos13/granite-ane-s128-macos13.mlpackage batch 1, sequence 128 Low-latency short-text companion
experimental/coreml/ane-b1-s1024-macos13/granite-ane-s1024-macos13.mlpackage batch 1, sequence 1024 Experimental long-context candidate

The packages are conversion artifacts only. Consumers still need the original Granite tokenizer and must preserve the same pooling and normalization semantics used by the consuming application.

Evaluation and Validation Snapshot

The table below mirrors the coverage of the official IBM Granite model card's evaluation section, but separates upstream model-quality scores from local Core ML conversion evidence. Not run means this Core ML package has not yet been evaluated on that benchmark family.

Metric from upstream evaluation table Official Granite 97M R2 reference Local Core ML s512 evidence Status
Multilingual MTEB Retrieval (18) 60.3 57.73 Run locally; fixed-shape Core ML harness
MTEB Retrieval (eng, v2) (10) 50.1 Not run Not evaluated for this artifact
MTEB Code (v1) (12) 60.4 Not run Not evaluated for this artifact
LongEmbed (6) 65.5 Not run Not evaluated for this artifact
RaR-b (17) 24.9 Not run Not evaluated for this artifact
AVG 52.2 Not computed Requires the same full benchmark set
H100 throughput (docs/s) 2,534 122.2 512-token windows/s Local Apple Silicon pure encode, 3-run short mean

The local 57.73 value is the simple unweighted task-level mean from this fixed-shape Core ML run multiplied by 100. It is useful as a caveated comparison point for this Core ML artifact, but it is not a recipe-equivalent reproduction of the official 60.3 model-card result.

For throughput context, IBM reports 2,534 docs/s on a single H100 using 512-token chunks. On a local Apple M4 Mac mini, the recommended sequence-512 Core ML ANE package measured 121.1 512-token windows/s in the 1024-chunk release run. A later targeted quiet check using three 256-chunk repeats measured a mean of 122.2 512-token windows/s, with a range of 121.5 to 123.6. This is roughly 4.8% of the H100 throughput, or about 21x slower, while running locally on Apple Silicon without a datacenter GPU.

Core ML Artifact Validation

Artifact Shape Status CPU parity Placement evidence Pure encode throughput
granite-ane-s128-macos13.mlpackage batch 1, sequence 128 Optional companion Pass cpu-and-ne, 3,031 Neural Engine-preferred ops 146.3 512-token windows/s
granite-ane-s512-macos13.mlpackage batch 1, sequence 512 Recommended default Pass cpu-and-ne, 3,209 Neural Engine-preferred ops 122.2 512-token windows/s short mean
granite-ane-s1024-macos13.mlpackage batch 1, sequence 1024 Experimental Pass cpu-and-ne, 3,209 Neural Engine-preferred ops 54.8 512-token windows/s

Core ML / Hugging Face Parity

The sequence-512 package was also checked against the original Hugging Face model on a small local ranking fixture:

Check Result
Mean embedding cosine 0.9999898
Minimum embedding cosine 0.9999838
Maximum absolute embedding delta 0.0008967
Mean top-k ranking overlap 1.0
Mean nDCG delta 0.0

Local Multilingual Retrieval Details

Full local Core ML run:

Field Value
Artifact granite-ane-s512-macos13.mlpackage
Backend Core ML
Benchmark MTEB(Multilingual, v2)
MTEB version 2.14.5
Tasks completed 18/18
MTEB exceptions 0
MIRACL hard-negative subsets 18/18
Sequence policy fixed sequence length 512 with tokenizer padding/truncation
Pooling and normalization CLS pooling, then L2 normalization

Per-task local main scores:

Task Subsets Main score
StackOverflowQA 1 0.81613
TwitterHjerneRetrieval 1 0.56869
AILAStatutes 1 0.27827
ArguAna 1 0.50822
HagridRetrieval 1 0.98694
LegalBenchCorporateLobbying 1 0.91474
LEMBPasskeyRetrieval 8 0.38500
SCIDOCS 1 0.20173
SpartQA 1 0.67397
TempReasonL1 1 0.05192
TRECCOVID 1 0.68176
WinoGrande 1 0.56795
BelebeleRetrieval 376 0.52829
MLQARetrieval 98 0.60411
StatcanDialogueDatasetRetrieval 4 0.55207
WikipediaRetrievalMultilingual 16 0.83237
CovidRetrieval 1 0.68499
MIRACLRetrievalHardNegatives 18 0.55417

These measurements are local validation results for the conversion artifacts, not official benchmark claims. Validate placement and throughput in the target application process before using the packages for performance comparisons.

License and Attribution

The original model card lists the base model license as Apache 2.0. This conversion repository follows that license metadata and attributes the base model to IBM Granite. For the official model, documentation, and limitations, refer to the original IBM Granite model card.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XReyRobert/granite-embedding-97m-multilingual-r2-coreml-ane-experimental

Quantized
(8)
this model

Collection including XReyRobert/granite-embedding-97m-multilingual-r2-coreml-ane-experimental