Instructions to use Bei0001/modernbert-concept-verifier-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Bei0001/modernbert-concept-verifier-onnx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Bei0001/modernbert-concept-verifier-onnx")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Bei0001/modernbert-concept-verifier-onnx") model = AutoModelForSequenceClassification.from_pretrained("Bei0001/modernbert-concept-verifier-onnx") - Notebooks
- Google Colab
- Kaggle
Concept Verifier (ONNX, int8)
A ModernBERT-base classifier fine-tuned for concept-level verification in the EXAMI knowledge-graph pipeline. Distributed in ONNX format with a dynamic int8 quantized variant for efficient CPU inference.
Files
| File | Purpose | Size |
|---|---|---|
model.onnx |
FP32 ONNX export (reference) | ~600 MB |
model_int8.onnx |
Dynamic int8 quantized for deployment | ~150 MB |
config.json |
HuggingFace config | |
tokenizer.json / tokenizer_config.json |
Fast tokenizer |
Usage (onnxruntime)
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
tok = AutoTokenizer.from_pretrained(".")
sess = ort.InferenceSession("model_int8.onnx",
providers=["CPUExecutionProvider"])
enc = tok("your concept text here",
return_tensors="np", padding=True, truncation=True, max_length=128)
feed = {k: v.astype(np.int64) for k, v in enc.items()
if k in {i.name for i in sess.get_inputs()}}
logits = sess.run(None, feed)[0]
Production context
This model is one of two classifiers used in the EXAMI knowledge-graph pipeline.
The other (the merge verifier) handles same-as / merge classification.
For details on how this model fits into the broader incremental knowledge-graph
architecture, see the merge-verifier model card and its accompanying
CLUSTERING_STRATEGY.md and MERGE_AND_CLUSTERING_ARCHITECTURE.md documents.
Notes on int8 quantization β partial regression validated on test set
Validated on a 5,000-row stratified test sample (same seed=42 split as fp32):
| fp32 test (full 21,651) | int8 test (5k sample) | Ξ | |
|---|---|---|---|
| real_concept P | 0.9361 | 0.9371 | +0.0010 (tied) |
| real_concept R | 0.9389 | 0.9045 | β0.0344 |
| macro_f0.5 | 0.9165 | 0.8944 | β0.0221 |
The int8 model trades recall for precision β admits ~3.4% fewer valid concepts than fp32 (β464 missed admissions per 13,552 valid concepts in test). Precision is intact.
Deployment guidance:
- Use int8 if file size matters (151 MB vs 599 MB) and you can tolerate a 3.4% recall loss. The dropped concepts are recoverable via re-extraction from another document.
- Use fp32 if you need maximum recall.
- 95.4% of MatMuls are properly quantized (vs 50.3% on DeBERTa-v3-large which
is broken β see the v2 model card). ModernBERT's standard transformer
architecture round-trips through
quantize_dynamiccleanly.
Diagnostic command (for reproducing the integrity check):
from collections import Counter
import onnx
m = onnx.load("model_int8.onnx")
ops = Counter(n.op_type for n in m.graph.node)
fp32_mm = ops.get("MatMul", 0)
int8_mm = ops.get("MatMulInteger", 0)
print(f"MatMul fp32 left: {fp32_mm}; MatMulInteger: {int8_mm}; "
f"quantized %: {100*int8_mm/(fp32_mm+int8_mm):.1f}")
# ModernBERT-base: 66.2% (with surrounding fp32 ops normal β model accuracy fine)
# DeBERTa-v3-large: 50.3% (with disentangled-attention partially fp32 β broken)
- Downloads last month
- 2
Model tree for Bei0001/modernbert-concept-verifier-onnx
Base model
answerdotai/ModernBERT-base