You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Concept Verifier (ONNX, int8)

A ModernBERT-base classifier fine-tuned for concept-level verification in the EXAMI knowledge-graph pipeline. Distributed in ONNX format with a dynamic int8 quantized variant for efficient CPU inference.

Files

File Purpose Size
model.onnx FP32 ONNX export (reference) ~600 MB
model_int8.onnx Dynamic int8 quantized for deployment ~150 MB
config.json HuggingFace config
tokenizer.json / tokenizer_config.json Fast tokenizer

Usage (onnxruntime)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tok = AutoTokenizer.from_pretrained(".")
sess = ort.InferenceSession("model_int8.onnx",
                             providers=["CPUExecutionProvider"])

enc = tok("your concept text here",
           return_tensors="np", padding=True, truncation=True, max_length=128)
feed = {k: v.astype(np.int64) for k, v in enc.items()
         if k in {i.name for i in sess.get_inputs()}}
logits = sess.run(None, feed)[0]

Production context

This model is one of two classifiers used in the EXAMI knowledge-graph pipeline. The other (the merge verifier) handles same-as / merge classification.

For details on how this model fits into the broader incremental knowledge-graph architecture, see the merge-verifier model card and its accompanying CLUSTERING_STRATEGY.md and MERGE_AND_CLUSTERING_ARCHITECTURE.md documents.

Notes on int8 quantization β€” partial regression validated on test set

Validated on a 5,000-row stratified test sample (same seed=42 split as fp32):

fp32 test (full 21,651) int8 test (5k sample) Ξ”
real_concept P 0.9361 0.9371 +0.0010 (tied)
real_concept R 0.9389 0.9045 βˆ’0.0344
macro_f0.5 0.9165 0.8944 βˆ’0.0221

The int8 model trades recall for precision β€” admits ~3.4% fewer valid concepts than fp32 (β‰ˆ464 missed admissions per 13,552 valid concepts in test). Precision is intact.

Deployment guidance:

  • Use int8 if file size matters (151 MB vs 599 MB) and you can tolerate a 3.4% recall loss. The dropped concepts are recoverable via re-extraction from another document.
  • Use fp32 if you need maximum recall.
  • 95.4% of MatMuls are properly quantized (vs 50.3% on DeBERTa-v3-large which is broken β€” see the v2 model card). ModernBERT's standard transformer architecture round-trips through quantize_dynamic cleanly.

Diagnostic command (for reproducing the integrity check):

from collections import Counter
import onnx
m = onnx.load("model_int8.onnx")
ops = Counter(n.op_type for n in m.graph.node)
fp32_mm = ops.get("MatMul", 0)
int8_mm = ops.get("MatMulInteger", 0)
print(f"MatMul fp32 left: {fp32_mm}; MatMulInteger: {int8_mm}; "
       f"quantized %: {100*int8_mm/(fp32_mm+int8_mm):.1f}")
# ModernBERT-base: 66.2% (with surrounding fp32 ops normal β€” model accuracy fine)
# DeBERTa-v3-large: 50.3% (with disentangled-attention partially fp32 β€” broken)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bei0001/modernbert-concept-verifier-onnx

Quantized
(26)
this model