EmbeddingGemma-300M β€” Tensor G4 NPU (Android 17 / Beta-SDK recompile)

AOT-compiled EmbeddingGemma-300M for the Google Tensor G4 NPU (darwinn EdgeTPU), built with the official Google Tensor ML SDK (Beta).

⚠️ Android firmware note β€” why this repo exists. The G4 NPU bytecode (DGC) is compiled against a specific Tensor NPU firmware. Builds compiled against Android 16 firmware fail to load on Android 17 (newer NPU runtime β†’ "Failed to get Darwinn graph" / SB-invocation error). This repo holds an A17-targeted recompile on the current Beta SDK. The older Android 16 build: xThr45hx/EmbeddingGemma-300M-Tensor-G4-NPU.

🚧 Status: on-device A17 load verification in progress. This build compiles clean (2265/2265 ops, single partition, DGC0 + rio_a0); confirming it loads + runs on a real Android 17 device is the next step. Provisional until this note is updated.

File

  • embeddinggemma-300M_seq256_Google_Tensor_G4.tflite β€” seq256 (max 256 tokens in one pass), 768-d output. The efficient RAG workhorse for short chunks/queries. (A seq512 long-form variant may follow.)

How it was compiled

  • Input: embeddinggemma-300M_seq256_mixed-precision.tflite from litert-community/embeddinggemma-300m (the plain, non-device-compiled mixed-precision file).
  • SDK: ai-edge-litert-nightly + ai-edge-litert-sdk-google-tensor==2.1.5; official aot_compile(target=[TENSOR_G4]), no flags (mixed-precision path); mandatory google_tensor_backend import.
  • Result: 2265 / 2265 ops offloaded to 1 partition (fully fused, no fallback). Output 196,993,056 bytes, markers DGC0 + rio_a0 + tfl3. SHA-256 eec2daf64f07f8cc84a92080c5e2afb00fc6bdf0cb688e00638c0229620b0b4a.

License

Gemma β€” inherits from EmbeddingGemma. See the base model card for terms.

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for xThr45hx/EmbeddingGemma-300M-Tensor-G4-A17

Finetuned
(1)
this model