EmbeddingGemma 300m β€” Core AI export

google/embeddinggemma-300m as a single static Core AI graph: the full sentence-transformers pipeline (transformer β†’ mean pooling β†’ dense projection β†’ L2 normalize) runs in-graph, so one call returns a normalized 768-d embedding. On-device semantic search / RAG for macOS 27 / iOS 27 beta.

Runs out of the box with CoreAIKit's TextEmbedder:

let embedder = try await TextEmbedder()   // downloads this repo
let doc = try await embedder.embed(document: "Tokyo is the capital of Japan.")
let query = try await embedder.embed(query: "what is the capital of Japan")
let score = TextEmbedder.cosineSimilarity(doc, query)

Retrieval prompt prefixes (task: search result | query: / title: none | text: ) are applied automatically by TextEmbedder.

Bundle layout

model/
β”œβ”€β”€ embeddinggemma-300m_float32_static.aimodel
β”œβ”€β”€ tokenizer/            (HF tokenizer files)
└── reference.json        (torch reference cosines used by the parity test)

Graph contract

name shape dtype
input input_ids [1, 256] int32 (pad id 0, mask 0 over padding)
input attention_mask [1, 256] int32
output embedding [1, 768] fp32, L2-normalized

Precision: fp32. Cross-runtime parity vs the torch SentenceTransformer pipeline is exact to 6 decimal places (see reference.json). fp16 variants (full cast AND mixed-precision autocast) produce NaN embeddings on-device β€” Gemma3 activations overflow half precision β€” so fp32 is shipped; a smaller int8 variant is future work.

License

Gemma Terms of Use (see the upstream model card). Conversion script: this repo's sibling, based on apple/coreai-models' recipe patterns (BSD-3-Clause).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support