HHEM-2.1-Open (ONNX)

An ONNX export of HHEM-2.1-Open (Vectara's Hughes Hallucination Evaluation Model, ~110M, FLAN-T5-base trunk with a token-classification head), for in-process inference without a Python/PyTorch runtime or trust_remote_code.

Why this exists

Upstream HHEM-2.1-Open ships safetensors plus trust_remote_code custom modeling (a wrapper around T5ForTokenClassification that slices position-0 logits), which optimum-cli will not export - so no public ONNX existed. This is that build, so anyone who wants to experiment with HHEM-2.1-Open can, without standing up a PyTorch runtime, custom modeling code, or a custom export pathway.

It reflects a Familiar Tools belief: a specialized, right-sized model that runs efficiently and in-process beats reaching for a large, general, resource-hungry one. Exporting a focused model to ONNX is part of that - it makes the model cheap to run, easy to embed, and light on dependencies. Custom, deliberately engineered solutions tend to be more efficient and more resource-aware than general-purpose defaults.

Files

Exported (opset 17) by bypassing the custom HHEMv2ForSequenceClassification wrapper and exporting the inner T5ForTokenClassification directly; the position-0 logit slice + softmax are applied by the caller.

File Notes
model.onnx (~419 MB) T5 encoder + token-classification head. Inputs: input_ids, attention_mask (both [batch, seq], dynamic). Output: logits [batch, seq, 2]. The consistency score is softmax(logits[:, 0, :])[1].
tokenizer.json FLAN-T5-base fast tokenizer (loads with the Rust tokenizers crate).
tokenizer_config.json, special_tokens_map.json Tokenizer metadata.
MODEL_REVISION.txt, sha256.txt Upstream commit SHA + source weights SHA-256 for provenance.

Source upstream revision: 8e4a2e6e96c708cc76c2344f7e4757df2515292c. Inference uses the HHEM prompt template (a prefix containing a literal <pad> token between premise and hypothesis), as in the upstream model.

Parity

The export was validated against the PyTorch reference on corpus pairs with |delta_p_consistent| < 1e-3.

License and attribution

Released under the Apache-2.0 License, matching upstream.

This repo redistributes a derivative (ONNX export) of the above under the same Apache-2.0 terms. Weights were not retrained or modified; only the inference graph was re-expressed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FamiliarTools/HHEM-2.1-Open-onnx

Quantized
(1)
this model
Finetunes
1 model