zjudge: ONNX NLI Models

Status: work in progress: stable release expected mid June 2026. Currently only default nli models available

ONNX-exported cross-encoder NLI models used by zer, a zero-shot entity resolution pipeline built for Dutch-centric data in law enforcement, medical, defence, and government contexts across different municipalities.

Models are exported from HuggingFace cross-encoders using models/generate_onnx_model.py and optimized with Optimum.

Precision support

zjudge targets three inference precisions:

Precision Status Notes
FP32 available CPU baseline, no optimization passes
FP16 available TensorRT EP / CUDA Tensor Core, level-1 or level-2 fused graph
INT8 planned quantized inference, coming mid June 2026

Training data

Models are fine-tuned on a synthetic dataset generated by zsimulate, covering Dutch-language entity resolution scenarios for four domains:

  • Law enforcement: incident reports, suspect/victim records, case cross-references
  • Medical: patient records, referral letters, diagnostic notes
  • Defence: personnel files, operational logs, asset registries
  • Government: municipal administration records across different Dutch municipalities

Repository layout

nli-base/
  base/          # FP32, no optimization: CPU inference baseline
  fp16/          # FP16 weights, no GPU-specific fusions
  fp16_fused/    # FP16 + level-2 fusions: TensorRT EP / CUDA Tensor Core

Each subdirectory contains a model.onnx alongside its tokenizer and config files, ready for use with ONNX Runtime.

Included models

Model Architecture Labels
nli-minilm-onnx RoBERTa (6L, hidden 768) contradiction / entailment / neutral
nli-deberta-v3-base DeBERTa-v3 contradiction / entailment / neutral

Usage

Download all variants with:

bash scripts/download_models.sh

Or load directly with ONNX Runtime:

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained(
    "arsalan-anwari/zjudge",
    subfolder="nli-base/fp16_fused/nli-minilm-onnx",
)
tokenizer = AutoTokenizer.from_pretrained(
    "arsalan-anwari/zjudge",
    subfolder="nli-base/fp16_fused/nli-minilm-onnx",
)

clf = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
clf("Amsterdam is the capital of the Netherlands.", candidate_labels=["geography", "sports"])

Exporting models yourself

pip install -r models/requirements.txt

# FP32 baseline
python models/generate_onnx_model.py \
    -m cross-encoder/nli-deberta-v3-base \
    -o models/nli-base/base/nli-deberta-v3-base/

# FP16 + fused (TensorRT / CUDA Tensor Core)
python models/generate_onnx_model.py \
    -m cross-encoder/nli-deberta-v3-base \
    -o models/nli-base/fp16_fused/nli-deberta-v3-base/ \
    --fp16 --optimization-level 1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support