EmbeddingGemma-300M — Tensor G4 NPU (Android 17 / Beta-SDK recompile)

AOT-compiled EmbeddingGemma-300M for the Google Tensor G4 NPU (darwinn EdgeTPU), built with the official Google Tensor ML SDK (Beta).

⚠️ Android firmware note — why this repo exists. The G4 NPU bytecode (DGC) is compiled against a specific Tensor NPU firmware. Builds compiled against Android 16 firmware fail to load on Android 17 (newer NPU runtime → "Failed to get Darwinn graph" / SB-invocation error). This repo holds an A17-targeted recompile on the current Beta SDK. The older Android 16 build: xThr45hx/EmbeddingGemma-300M-Tensor-G4-NPU.

🚧 Status: on-device A17 load verification in progress. This build compiles clean (2265/2265 ops, single partition, DGC0 + rio_a0); confirming it loads + runs on a real Android 17 device is the next step. Provisional until this note is updated.

File

embeddinggemma-300M_seq256_Google_Tensor_G4.tflite — seq256 (max 256 tokens in one pass), 768-d output. The efficient RAG workhorse for short chunks/queries. (A seq512 long-form variant may follow.)

How it was compiled

Input: embeddinggemma-300M_seq256_mixed-precision.tflite from litert-community/embeddinggemma-300m (the plain, non-device-compiled mixed-precision file).
SDK: ai-edge-litert-nightly + ai-edge-litert-sdk-google-tensor==2.1.5; official aot_compile(target=[TENSOR_G4]), no flags (mixed-precision path); mandatory google_tensor_backend import.
Result: 2265 / 2265 ops offloaded to 1 partition (fully fused, no fallback). Output 196,993,056 bytes, markers DGC0 + rio_a0 + tfl3. SHA-256 eec2daf64f07f8cc84a92080c5e2afb00fc6bdf0cb688e00638c0229620b0b4a.

License

Gemma — inherits from EmbeddingGemma. See the base model card for terms.

Downloads last month: 31

Model tree for xThr45hx/EmbeddingGemma-300M-Tensor-G4-A17

Base model

litert-community/embeddinggemma-300m

Finetuned

(1)

this model