HHEM-2.1-Open (ONNX)
An ONNX export of HHEM-2.1-Open
(Vectara's Hughes Hallucination Evaluation Model, ~110M, FLAN-T5-base trunk with
a token-classification head), for in-process inference without a Python/PyTorch
runtime or trust_remote_code.
Why this exists
Upstream HHEM-2.1-Open ships safetensors plus trust_remote_code custom
modeling (a wrapper around T5ForTokenClassification that slices position-0
logits), which optimum-cli will not export - so no public ONNX existed. This
is that build, so anyone who wants to experiment with HHEM-2.1-Open can, without
standing up a PyTorch runtime, custom modeling code, or a custom export pathway.
It reflects a Familiar Tools belief: a specialized, right-sized model that runs efficiently and in-process beats reaching for a large, general, resource-hungry one. Exporting a focused model to ONNX is part of that - it makes the model cheap to run, easy to embed, and light on dependencies. Custom, deliberately engineered solutions tend to be more efficient and more resource-aware than general-purpose defaults.
Files
Exported (opset 17) by bypassing the custom HHEMv2ForSequenceClassification
wrapper and exporting the inner T5ForTokenClassification directly; the
position-0 logit slice + softmax are applied by the caller.
| File | Notes |
|---|---|
model.onnx (~419 MB) |
T5 encoder + token-classification head. Inputs: input_ids, attention_mask (both [batch, seq], dynamic). Output: logits [batch, seq, 2]. The consistency score is softmax(logits[:, 0, :])[1]. |
tokenizer.json |
FLAN-T5-base fast tokenizer (loads with the Rust tokenizers crate). |
tokenizer_config.json, special_tokens_map.json |
Tokenizer metadata. |
MODEL_REVISION.txt, sha256.txt |
Upstream commit SHA + source weights SHA-256 for provenance. |
Source upstream revision: 8e4a2e6e96c708cc76c2344f7e4757df2515292c.
Inference uses the HHEM prompt template (a prefix containing a literal <pad>
token between premise and hypothesis), as in the upstream model.
Parity
The export was validated against the PyTorch reference on corpus pairs with
|delta_p_consistent| < 1e-3.
License and attribution
Released under the Apache-2.0 License, matching upstream.
- HHEM-2.1-Open by Vectara:
vectara/hallucination_evaluation_model. - Base model:
google/flan-t5-base.
This repo redistributes a derivative (ONNX export) of the above under the same Apache-2.0 terms. Weights were not retrained or modified; only the inference graph was re-expressed.