zjudge: ONNX NLI Models
Status: work in progress: stable release expected mid June 2026. Currently only default nli models available
ONNX-exported cross-encoder NLI models used by zer, a zero-shot entity resolution pipeline built for Dutch-centric data in law enforcement, medical, defence, and government contexts across different municipalities.
Models are exported from HuggingFace cross-encoders using models/generate_onnx_model.py and optimized with Optimum.
Precision support
zjudge targets three inference precisions:
| Precision | Status | Notes |
|---|---|---|
| FP32 | available | CPU baseline, no optimization passes |
| FP16 | available | TensorRT EP / CUDA Tensor Core, level-1 or level-2 fused graph |
| INT8 | planned | quantized inference, coming mid June 2026 |
Training data
Models are fine-tuned on a synthetic dataset generated by zsimulate, covering Dutch-language entity resolution scenarios for four domains:
- Law enforcement: incident reports, suspect/victim records, case cross-references
- Medical: patient records, referral letters, diagnostic notes
- Defence: personnel files, operational logs, asset registries
- Government: municipal administration records across different Dutch municipalities
Repository layout
nli-base/
base/ # FP32, no optimization: CPU inference baseline
fp16/ # FP16 weights, no GPU-specific fusions
fp16_fused/ # FP16 + level-2 fusions: TensorRT EP / CUDA Tensor Core
Each subdirectory contains a model.onnx alongside its tokenizer and config files, ready for use with ONNX Runtime.
Included models
| Model | Architecture | Labels |
|---|---|---|
nli-minilm-onnx |
RoBERTa (6L, hidden 768) | contradiction / entailment / neutral |
nli-deberta-v3-base |
DeBERTa-v3 | contradiction / entailment / neutral |
Usage
Download all variants with:
bash scripts/download_models.sh
Or load directly with ONNX Runtime:
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
model = ORTModelForSequenceClassification.from_pretrained(
"arsalan-anwari/zjudge",
subfolder="nli-base/fp16_fused/nli-minilm-onnx",
)
tokenizer = AutoTokenizer.from_pretrained(
"arsalan-anwari/zjudge",
subfolder="nli-base/fp16_fused/nli-minilm-onnx",
)
clf = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
clf("Amsterdam is the capital of the Netherlands.", candidate_labels=["geography", "sports"])
Exporting models yourself
pip install -r models/requirements.txt
# FP32 baseline
python models/generate_onnx_model.py \
-m cross-encoder/nli-deberta-v3-base \
-o models/nli-base/base/nli-deberta-v3-base/
# FP16 + fused (TensorRT / CUDA Tensor Core)
python models/generate_onnx_model.py \
-m cross-encoder/nli-deberta-v3-base \
-o models/nli-base/fp16_fused/nli-deberta-v3-base/ \
--fp16 --optimization-level 1