Experimental Granite 97M Multilingual R2 Core ML ANE Packages

Experimental Status

These artifacts are personal experimental Core ML conversions of ibm-granite/granite-embedding-97m-multilingual-r2. They are intended for macOS Core ML / Apple Neural Engine experimentation and should be treated as research/engineering artifacts rather than an official model release.

For the official model description, supported languages, intended use, training details, license, and limitations, refer to the original IBM Granite model card: https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2.

What This Repository Contains

This repository contains fixed-shape Apple Core ML .mlpackage variants derived from the original Granite Embedding 97M Multilingual R2 model:

Path	Shape	Suggested use
`coreml/ane-b1-s512-macos13/granite-ane-s512-macos13.mlpackage`	batch 1, sequence 512	Recommended default fixed-shape package
`coreml/ane-b1-s128-macos13/granite-ane-s128-macos13.mlpackage`	batch 1, sequence 128	Low-latency short-text companion
`experimental/coreml/ane-b1-s1024-macos13/granite-ane-s1024-macos13.mlpackage`	batch 1, sequence 1024	Experimental long-context candidate

The packages are conversion artifacts only. Consumers still need the original Granite tokenizer and must preserve the same pooling and normalization semantics used by the consuming application.

Evaluation and Validation Snapshot

The table below mirrors the coverage of the official IBM Granite model card's evaluation section, but separates upstream model-quality scores from local Core ML conversion evidence. Not run means this Core ML package has not yet been evaluated on that benchmark family.

Metric from upstream evaluation table	Official Granite 97M R2 reference	Local Core ML `s512` evidence	Status
Multilingual MTEB Retrieval (18)	`60.3`	`57.73`	Run locally; fixed-shape Core ML harness
MTEB Retrieval (eng, v2) (10)	`50.1`	Not run	Not evaluated for this artifact
MTEB Code (v1) (12)	`60.4`	Not run	Not evaluated for this artifact
LongEmbed (6)	`65.5`	Not run	Not evaluated for this artifact
RaR-b (17)	`24.9`	Not run	Not evaluated for this artifact
AVG	`52.2`	Not computed	Requires the same full benchmark set
H100 throughput (docs/s)	`2,534`	`122.2` 512-token windows/s	Local Apple Silicon pure encode, 3-run short mean

The local 57.73 value is the simple unweighted task-level mean from this fixed-shape Core ML run multiplied by 100. It is useful as a caveated comparison point for this Core ML artifact, but it is not a recipe-equivalent reproduction of the official 60.3 model-card result.

For throughput context, IBM reports 2,534 docs/s on a single H100 using 512-token chunks. On a local Apple M4 Mac mini, the recommended sequence-512 Core ML ANE package measured 121.1 512-token windows/s in the 1024-chunk release run. A later targeted quiet check using three 256-chunk repeats measured a mean of 122.2 512-token windows/s, with a range of 121.5 to 123.6. This is roughly 4.8% of the H100 throughput, or about 21x slower, while running locally on Apple Silicon without a datacenter GPU.

Core ML Artifact Validation

Artifact	Shape	Status	CPU parity	Placement evidence	Pure encode throughput
`granite-ane-s128-macos13.mlpackage`	batch 1, sequence 128	Optional companion	Pass	`cpu-and-ne`, 3,031 Neural Engine-preferred ops	`146.3` 512-token windows/s
`granite-ane-s512-macos13.mlpackage`	batch 1, sequence 512	Recommended default	Pass	`cpu-and-ne`, 3,209 Neural Engine-preferred ops	`122.2` 512-token windows/s short mean
`granite-ane-s1024-macos13.mlpackage`	batch 1, sequence 1024	Experimental	Pass	`cpu-and-ne`, 3,209 Neural Engine-preferred ops	`54.8` 512-token windows/s

Core ML / Hugging Face Parity

The sequence-512 package was also checked against the original Hugging Face model on a small local ranking fixture:

Check	Result
Mean embedding cosine	`0.9999898`
Minimum embedding cosine	`0.9999838`
Maximum absolute embedding delta	`0.0008967`
Mean top-k ranking overlap	`1.0`
Mean nDCG delta	`0.0`

Local Multilingual Retrieval Details

Full local Core ML run:

Field	Value
Artifact	`granite-ane-s512-macos13.mlpackage`
Backend	Core ML
Benchmark	`MTEB(Multilingual, v2)`
MTEB version	`2.14.5`
Tasks completed	`18/18`
MTEB exceptions	`0`
MIRACL hard-negative subsets	`18/18`
Sequence policy	fixed sequence length 512 with tokenizer padding/truncation
Pooling and normalization	CLS pooling, then L2 normalization

Per-task local main scores:

Task	Subsets	Main score
StackOverflowQA	1	`0.81613`
TwitterHjerneRetrieval	1	`0.56869`
AILAStatutes	1	`0.27827`
ArguAna	1	`0.50822`
HagridRetrieval	1	`0.98694`
LegalBenchCorporateLobbying	1	`0.91474`
LEMBPasskeyRetrieval	8	`0.38500`
SCIDOCS	1	`0.20173`
SpartQA	1	`0.67397`
TempReasonL1	1	`0.05192`
TRECCOVID	1	`0.68176`
WinoGrande	1	`0.56795`
BelebeleRetrieval	376	`0.52829`
MLQARetrieval	98	`0.60411`
StatcanDialogueDatasetRetrieval	4	`0.55207`
WikipediaRetrievalMultilingual	16	`0.83237`
CovidRetrieval	1	`0.68499`
MIRACLRetrievalHardNegatives	18	`0.55417`

These measurements are local validation results for the conversion artifacts, not official benchmark claims. Validate placement and throughput in the target application process before using the packages for performance comparisons.

License and Attribution

The original model card lists the base model license as Apache 2.0. This conversion repository follows that license metadata and attributes the base model to IBM Granite. For the official model, documentation, and limitations, refer to the original IBM Granite model card.

Downloads last month: 9

Model tree for XReyRobert/granite-embedding-97m-multilingual-r2-coreml-ane-experimental

Base model

ibm-granite/granite-embedding-97m-multilingual-r2

Quantized

(8)

this model

Collection including XReyRobert/granite-embedding-97m-multilingual-r2-coreml-ane-experimental

Granite playground

Collection

Granite Core ML and Apple Neural Engine experiments. • 1 item • Updated 7 days ago