HF Space

CERT Hallucination Detection

Detects LLM hallucinations using embedding geometry. Various benchmarks.

Methods compared

CERT SGI (with context): ratio of distances on the embedding hypersphere — dist(response, question) / dist(response, context). No model inference for the evaluation. One embedding call, one division.

CERT DGI (without context): cosine similarity between the response displacement vector and the mean displacement of verified grounded pairs.

HHEM-2.1-Open (Vectara): fine-tuned flan-T5 classifier. Full model inference per evaluation call.

When they disagree

Disagreement surfaces Type III hallucinations — factual errors within a correct semantic frame. Embedding geometry cannot detect these: the response occupies the geometrically correct region of the space despite being factually wrong. HHEM's classifier may catch some of these cases. The two methods are orthogonal signals, not competing alternatives.

Research

Research & Theoretical Foundations

This tool is grounded in three intersecting research domains: geometric hallucination detection, legal AI benchmarking, and retrieval-augmented generation (RAG) faithfulness. The methods implemented here — SGI and DGI — are direct implementations of peer-reviewed work. The legal framing addresses a documented, high-stakes failure mode in deployed AI systems.


Geometric Hallucination Detection (Core Methods)

The CERT framework treats LLM outputs as vectors in a high-dimensional embedding space φ: T → ℝ^d and uses geometric properties of that space to detect grounding failures — without requiring a trained classifier or ground-truth labels.

Semantic Grounding Index (SGI) Defined as the ratio of distances in embedding space:

SGI(q, c, r) = ‖φ(r) − φ(q)‖ / ‖φ(r) − φ(c)‖

where q is the query, c is the source context (e.g., contract clause), and r is the LLM response. A grounded response should satisfy SGI ≥ τ (threshold = 0.95), meaning the response moved geometrically closer to the context than to the question.

Directional Grounding Index (DGI) When no source document is available, DGI measures whether the displacement vector Δ = φ(r) − φ(q) aligns with the mean displacement direction μ̂ of verified grounded pairs:

DGI(q, r) = (Δ / ‖Δ‖) · μ̂

A score below 0.30 indicates the response trajectory is anomalous relative to verified correct legal reasoning patterns — a geometric signal of confabulation.

Rotational Constraint Processing Companion work explaining why transformer attention geometry produces these detectable displacement patterns — grounded responses exhibit measurable rotational alignment with factual constraint directions in the residual stream.


Hallucination — Foundational Literature

Survey of Hallucination in Natural Language Generation The canonical taxonomy paper. Classifies hallucinations as intrinsic (contradicts source) vs. extrinsic (adds unverifiable content) — a distinction directly relevant to contract review, where both failure modes carry legal risk.

TruthfulQA: Measuring How Models Mimic Human Falsehoods Benchmark demonstrating that larger models are not necessarily more truthful — they are better at producing plausible falsehoods. Directly relevant to legal AI, where fluency and legal vocabulary mask factual errors.

P(truthful | fluent) ≠ P(truthful)

Siren's Song in the AI Ocean: A Survey on Hallucination in LLMs Covers hallucination across the full model lifecycle — pretraining data bias, decoding strategies, and RLHF alignment failures. Includes mitigation taxonomy with retrieval, calibration, and post-hoc verification approaches.


Legal AI Benchmarking

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning 264 tasks spanning statutory reasoning, contract interpretation, and rule application — assembled by 40+ legal professionals. Establishes baseline performance gaps between general-purpose LLMs and legally reliable reasoning. Directly motivates hallucination detection as a required layer over any legal AI system.

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Understanding 510 commercial contracts annotated by legal experts across 41 clause categories. The standard benchmark for contract clause extraction and understanding — the task this tool's SGI scoring is designed to protect.


RAG Faithfulness

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks The foundational RAG paper. Defines the architecture that SGI is designed to audit: a retriever p_η(z|x) selects documents z given query x, and a generator p_θ(y|x,z) conditions on both. SGI detects when the generator fails to condition on z — the core faithfulness failure in document-grounded legal AI.

p(y|x) = Σ_z p_η(z|x) · p_θ(y|x,z)

Case Law Context

Mata v. Avianca, No. 22-cv-1461 (S.D.N.Y. 2023) Attorneys submitted a brief citing six fabricated case citations generated by ChatGPT. The court imposed sanctions. Every cited case — including purported holdings and quotations — was a hallucination. This is the canonical real-world example of extrinsic hallucination in a legal context: the model produced fluent, jurisdiction-appropriate, entirely fictional legal authority.

This case motivates the core design principle of this tool: hallucination detection must run before any AI-generated legal content is relied upon, not after.

Dashboard

cert-framework.com

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonysodano/Hallucination-Detection-LLM-CERT-vs-HHEM

Dataset used to train tonysodano/Hallucination-Detection-LLM-CERT-vs-HHEM

Papers for tonysodano/Hallucination-Detection-LLM-CERT-vs-HHEM