You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Concept Verifier (ONNX, int8)

A ModernBERT-base classifier fine-tuned for concept-level verification in the EXAMI knowledge-graph pipeline. Distributed in ONNX format with a dynamic int8 quantized variant for efficient CPU inference.

Files

File	Purpose	Size
`model.onnx`	FP32 ONNX export (reference)	~600 MB
`model_int8.onnx`	Dynamic int8 quantized for deployment	~150 MB
`config.json`	HuggingFace config
`tokenizer.json` / `tokenizer_config.json`	Fast tokenizer

Usage (onnxruntime)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

tok = AutoTokenizer.from_pretrained(".")
sess = ort.InferenceSession("model_int8.onnx",
                             providers=["CPUExecutionProvider"])

enc = tok("your concept text here",
           return_tensors="np", padding=True, truncation=True, max_length=128)
feed = {k: v.astype(np.int64) for k, v in enc.items()
         if k in {i.name for i in sess.get_inputs()}}
logits = sess.run(None, feed)[0]

Production context

This model is one of two classifiers used in the EXAMI knowledge-graph pipeline. The other (the merge verifier) handles same-as / merge classification.

For details on how this model fits into the broader incremental knowledge-graph architecture, see the merge-verifier model card and its accompanying CLUSTERING_STRATEGY.md and MERGE_AND_CLUSTERING_ARCHITECTURE.md documents.

Notes on int8 quantization — partial regression validated on test set

Validated on a 5,000-row stratified test sample (same seed=42 split as fp32):

	fp32 test (full 21,651)	int8 test (5k sample)	Δ
real_concept P	0.9361	0.9371	+0.0010 (tied)
real_concept R	0.9389	0.9045	−0.0344
macro_f0.5	0.9165	0.8944	−0.0221

The int8 model trades recall for precision — admits ~3.4% fewer valid concepts than fp32 (≈464 missed admissions per 13,552 valid concepts in test). Precision is intact.

Deployment guidance:

Use int8 if file size matters (151 MB vs 599 MB) and you can tolerate a 3.4% recall loss. The dropped concepts are recoverable via re-extraction from another document.
Use fp32 if you need maximum recall.
95.4% of MatMuls are properly quantized (vs 50.3% on DeBERTa-v3-large which is broken — see the v2 model card). ModernBERT's standard transformer architecture round-trips through quantize_dynamic cleanly.

Diagnostic command (for reproducing the integrity check):

from collections import Counter
import onnx
m = onnx.load("model_int8.onnx")
ops = Counter(n.op_type for n in m.graph.node)
fp32_mm = ops.get("MatMul", 0)
int8_mm = ops.get("MatMulInteger", 0)
print(f"MatMul fp32 left: {fp32_mm}; MatMulInteger: {int8_mm}; "
       f"quantized %: {100*int8_mm/(fp32_mm+int8_mm):.1f}")
# ModernBERT-base: 66.2% (with surrounding fp32 ops normal — model accuracy fine)
# DeBERTa-v3-large: 50.3% (with disentangled-attention partially fp32 — broken)

Downloads last month: 2

Model tree for Bei0001/modernbert-concept-verifier-onnx

Base model

answerdotai/ModernBERT-base

Quantized

(26)

this model