EU-Halt heads for `microsoft/Phi-3.5-mini-instruct` (`default`)

Lightweight epistemic-uncertainty detector: K=4 prediction heads sharing the frozen microsoft/Phi-3.5-mini-instruct trunk. Configuration: mid_dim=128, K=4 (default).

Quick start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from eu_halt import attach

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct", torch_dtype=torch.bfloat16,
).to("cuda").eval()
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

uncertainty = attach(
    model,
    heads_repo="debajyotidasgupta/eu-halt-phi-3.5-mini-instruct",
    mid_dim=128,
)
print(uncertainty("Who founded Quora in 2008?", tokenizer))
# Higher = more uncertain.

Files in this repo

heads_final.pt — final K=4 head state dict.
heads_step{500,1000,1500,2000,2500}.pt — intermediate checkpoints.
source_layers.json — the 4 trunk-layer indices the heads read from.
history.json — per-step loss + disagreement + GPU stats.

Training

Dataset: HuggingFaceFW/fineweb-edu (streaming).
~2-5M tokens, batch_size 2-4, seq_len 512, ~2000-2500 steps.
AdamW (lr 3e-4 to 5e-4), 100-200 warmup steps.
K=4 heads, mid_dim=128, training_noise_std=0.01, dropout=0.1 (or both 0 for quiet variants).
Single GPU (~10-15 min on RTX A5000/A6000/L40S).

Evaluation

OOD AUROC (id vs ood), 2164 samples total:

Signal	AUROC	95% CI
disagreement	0.7420	[0.7193, 0.7614]
entropy	0.5382	[0.5150, 0.5612]
last_token_unc	0.4002	[0.3691, 0.4337]
mahalanobis	1.0000	[1.0000, 1.0000]
p_true	nan	[nan, nan]
semantic_entropy	nan	[nan, nan]

Best signal: mahalanobis

Intended use

Hallucination flagging at inference time (score before / during generation).
Dynamic-RAG gating (retrieve iff uncertainty > τ).
Selective prediction / risk-coverage trade-offs.
Token-level uncertainty visualization via uncertainty.per_token(text, tokenizer).

Limitations

No fine-tuning of the trunk — only the auxiliary heads are trained.
Heads are trained on web text. Specialized domains (medical, legal) may need a domain-specific recalibration.
For Gemma's 256k vocab, head output projection is ~70-100M params per head — still small relative to the trunk.

License

Apache-2.0 for the heads. The trunk model microsoft/Phi-3.5-mini-instruct retains its own license (Qwen3 / Llama-3 / Phi / Gemma).

Citation

@misc{ais_eu_halt_2026,
  author = {Dasgupta, Debajyoti and Anthropic Claude},
  title  = {EU-Halt: Lightweight Multi-Head Epistemic Detectors for Frozen LLMs},
  year   = {2026},
  url    = {https://huggingface.co/debajyotidasgupta/eu-halt-phi-3.5-mini-instruct},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for debajyotidasgupta/eu-halt-phi-3.5-mini-instruct

Base model

microsoft/Phi-3.5-mini-instruct

Finetuned

(277)

this model

Space using debajyotidasgupta/eu-halt-phi-3.5-mini-instruct 1

Collection including debajyotidasgupta/eu-halt-phi-3.5-mini-instruct

EU-Halt — Production Heads

Collection

K=4 head sets for frozen LLM trunks. Attach with eu_halt.attach(model, heads_repo=...). One forward pass → disagreement score. • 11 items • Updated 2 days ago

EU-Halt heads for microsoft/Phi-3.5-mini-instruct (default)