embeddinggemma-300m-lcsh β€” ONNX (transformers.js)

ONNX export of the LCSH-fine-tuned EmbeddingGemma-300M (LoRA + Matryoshka, merged to a standalone backbone), packaged for in-browser / on-device use via transformers.js and ONNX Runtime Web. This is the embedder behind the LCSHBench retrieval leaderboard ("EmbeddingGemma-300M, fine-tuned"), exported for a Chrome extension.

Files

File Precision Size Use
onnx/model.onnx fp32 1.2 GB reference / server (cos@256 = 1.000 vs PyTorch)
onnx/model_quantized.onnx int8 (q8) 296 MB what the browser loads (cos@256 β‰ˆ 0.97)

The model outputs sentence_embedding directly β€” the trained Dense/Matryoshka heads are baked into the graph. 768-native; truncate to 256 dims and re-normalize (Matryoshka) to match the on-device lcsh.db. No prompt/prefix is applied (the fine-tune is symmetric).

Usage (transformers.js)

import { pipeline } from '@huggingface/transformers';
const extractor = await pipeline('feature-extraction',
  'kltng/embeddinggemma-300m-lcsh-onnx', { dtype: 'q8' });
const out = await extractor(text, { pooling: 'none' });   // already sentence_embedding
// take out[:256], L2-normalize, then cosine vs the lcsh.db vectors.

Pair with the companion vector store kltng/lcsh-db-ft (an lcsh.db re-embedded with this exact q8 model, so build-time and query-time vectors are consistent).

License

Derived from google/embeddinggemma-300m and distributed under the Gemma Terms of Use. Use is subject to Google's Gemma Prohibited Use Policy.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kltng/embeddinggemma-300m-lcsh-ONNX

Quantized
(290)
this model