Staleguard (int8 ONNX)

A 3-class code↔doc coherence cross-encoder: given a (code premise, prose claim) pair it predicts {entailment, neutral, contradiction}. Fine-tuned from microsoft/unixcoder-base, then exported to ONNX and dynamically quantized to int8 (per-channel, avx512_vnni) for portable CPU inference.

GitHub: Arthur920/Staleguard · Docs: arthur920.github.io/Staleguard

  • Artifact: model_quantized.onnx (~121 MB, ~4× smaller than the fp32 checkpoint)
  • Labels: 0=entailment, 1=neutral, 2=contradiction
  • Lead metric: held-out contradiction precision (repo-disjoint eval split) — ~87.6% precision / ~89.9% recall on the contradiction (alert) class.

Usage

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

repo = "Arthur920/staleguard"
tok = AutoTokenizer.from_pretrained(repo)
model = ORTModelForSequenceClassification.from_pretrained(
    repo, file_name="model_quantized.onnx")

inputs = tok("def add(a, b): return a + b",
             "The function returns the sum of a and b.",
             truncation=True, max_length=192, return_tensors="pt")
logits = model(**inputs).logits
print(model.config.id2label[int(logits.argmax(-1))])

Notes

Int8 dynamic quantization quantizes the Linear/MatMul weights; activations stay fp32. Parity check vs the fp32 checkpoint showed matching argmax labels on sample pairs. Re-quantize from the fp32 export with model/quantize.py.

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arthur920/staleguard

Quantized
(3)
this model