Instructions to use KingLLM/nomic-codesearch-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use KingLLM/nomic-codesearch-onnx with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("KingLLM/nomic-codesearch-onnx", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
nomic-codesearch-onnx (INT8 Quantized)
This model is a fine-tuned version of nomic-ai/nomic-embed-text-v1.5 trained specifically for semantic code search on Python code snippets, then exported to ONNX and dynamically quantized to INT8 for efficient on-device execution (CPU/Mobile).
The final quantized model is compressed from 530 MB to 100 MB (a ~5x reduction) while maintaining high retrieval performance, making it perfect for on-device deployment on Android, iOS, or other resource-constrained environments.
Model Details
- Base Model:
nomic-ai/nomic-embed-text-v1.5(137M parameters, 768-dimensional embeddings) - Fine-Tuning Dataset:
code-search-net/code_search_net(Python split). Trained on 50,000 positive(docstring, function)pairs using Multiple Negatives Ranking Loss (MNR). - Training Acceleration: Apple Silicon (M4 MPS)
- Export Format: ONNX (Opset 17)
- Quantization: Dynamic INT8 Quantization (weights quantized to
QInt8, activation optimized) - Dimensions: 768 (supports Matryoshka Representation Learning down to 256 dimensions)
Metrics
| Config | Size | Mean Cosine Drift | NDCG@10 (Code Search) |
|---|---|---|---|
| Baseline Model | 530 MB | 0.0 | ~0.48 |
| Fine-Tuned FP32 ONNX | 530 MB | 0.0 | ~0.71 |
| Fine-Tuned INT8 ONNX | 100 MB | ~0.07 | ~0.68 |
Python Quickstart
To run semantic code search or generate embeddings locally using this ONNX model:
1. Install Dependencies
pip install onnxruntime transformers numpy
2. Run Inference
import os
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
# Load tokenizer and ONNX session
# Ensure config.json, tokenizer.json, vocab.txt, etc., are in the same directory
model_dir = "./"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
session = ort.InferenceSession(os.path.join(model_dir, "model_int8.onnx"))
def embed(texts: list[str], max_length: int = 512) -> np.ndarray:
"""Return L2-normalised sentence embeddings, shape (len(texts), 768)."""
encoded = tokenizer(
texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="np",
)
outputs = session.run(
["sentence_embedding"],
{
"input_ids": encoded["input_ids"].astype(np.int64),
"attention_mask": encoded["attention_mask"].astype(np.int64),
},
)
embeddings = outputs[0] # (batch, 768)
# L2 normalise so dot-product == cosine similarity
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
return embeddings / np.maximum(norms, 1e-12)
# Embed query and snippets
snippets = [
"def add(a, b): return a + b",
"def binary_search(arr, target): ...",
"SELECT * FROM users WHERE age > 18"
]
query = "function that sums two numbers"
query_emb = embed([query])
code_embs = embed(snippets)
# Calculate similarity (dot product of L2-normalized embeddings)
scores = (query_emb @ code_embs.T)[0]
for idx, score in enumerate(scores):
print(f"[{score:.4f}] {snippets[idx]}")
On-Device Deployment (Android)
This model has been successfully deployed inside a native Android application using:
- ONNX Runtime Android AAR (
com.microsoft.onnxruntime:onnxruntime-android) for CPU inference. - Custom WordPiece Tokenizer in Kotlin (
BertTokenizer.kt) to parse strings directly on-device without JVM-overhead Python dependencies. - Coroutines-based asynchronous loading to load the 100 MB model in the background without blocking the UI thread.
For complete Android source files (MainActivity, OnnxEmbedder, and BertTokenizer), please refer to the GitHub repository: CoderOMaster/nomic-codesearch-android.
License
This project is licensed under the Apache 2.0 License.
- Downloads last month
- 40