EU-Halt heads for microsoft/Phi-3.5-mini-instruct (default)

Lightweight epistemic-uncertainty detector: K=4 prediction heads sharing the frozen microsoft/Phi-3.5-mini-instruct trunk. Configuration: mid_dim=128, K=4 (default).

Quick start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from eu_halt import attach

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3.5-mini-instruct", torch_dtype=torch.bfloat16,
).to("cuda").eval()
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

uncertainty = attach(
    model,
    heads_repo="debajyotidasgupta/eu-halt-phi-3.5-mini-instruct",
    mid_dim=128,
)
print(uncertainty("Who founded Quora in 2008?", tokenizer))
# Higher = more uncertain.

Files in this repo

  • heads_final.pt β€” final K=4 head state dict.
  • heads_step{500,1000,1500,2000,2500}.pt β€” intermediate checkpoints.
  • source_layers.json β€” the 4 trunk-layer indices the heads read from.
  • history.json β€” per-step loss + disagreement + GPU stats.

Training

  • Dataset: HuggingFaceFW/fineweb-edu (streaming).
  • ~2-5M tokens, batch_size 2-4, seq_len 512, ~2000-2500 steps.
  • AdamW (lr 3e-4 to 5e-4), 100-200 warmup steps.
  • K=4 heads, mid_dim=128, training_noise_std=0.01, dropout=0.1 (or both 0 for quiet variants).
  • Single GPU (~10-15 min on RTX A5000/A6000/L40S).

Evaluation

OOD AUROC (id vs ood), 2164 samples total:

Signal AUROC 95% CI
disagreement 0.7420 [0.7193, 0.7614]
entropy 0.5382 [0.5150, 0.5612]
last_token_unc 0.4002 [0.3691, 0.4337]
mahalanobis 1.0000 [1.0000, 1.0000]
p_true nan [nan, nan]
semantic_entropy nan [nan, nan]

Best signal: mahalanobis

Intended use

  • Hallucination flagging at inference time (score before / during generation).
  • Dynamic-RAG gating (retrieve iff uncertainty > Ο„).
  • Selective prediction / risk-coverage trade-offs.
  • Token-level uncertainty visualization via uncertainty.per_token(text, tokenizer).

Limitations

  • No fine-tuning of the trunk β€” only the auxiliary heads are trained.
  • Heads are trained on web text. Specialized domains (medical, legal) may need a domain-specific recalibration.
  • For Gemma's 256k vocab, head output projection is ~70-100M params per head β€” still small relative to the trunk.

License

Apache-2.0 for the heads. The trunk model microsoft/Phi-3.5-mini-instruct retains its own license (Qwen3 / Llama-3 / Phi / Gemma).

Citation

@misc{ais_eu_halt_2026,
  author = {Dasgupta, Debajyoti and Anthropic Claude},
  title  = {EU-Halt: Lightweight Multi-Head Epistemic Detectors for Frozen LLMs},
  year   = {2026},
  url    = {https://huggingface.co/debajyotidasgupta/eu-halt-phi-3.5-mini-instruct},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for debajyotidasgupta/eu-halt-phi-3.5-mini-instruct

Finetuned
(277)
this model

Space using debajyotidasgupta/eu-halt-phi-3.5-mini-instruct 1

Collection including debajyotidasgupta/eu-halt-phi-3.5-mini-instruct