ModernBERT-large โ€” Tool-Calling Hallucination Span Detector

Fine-tuned ModernBERT-large for token-level hallucination span detection in LLM tool-calling answers. Detects which answer spans are hallucinated (answer_mismatch / missing_tool / overgeneration).

Result

Span-level IoU F1 (greedy matching, IoU โ‰ฅ 0.5) on the s-nlp/toolace-unified-hallucinations test split:

Model Span F1 answer_mismatch missing_tool overgeneration
Published baseline (s-nlp/tool-calling-hallucination-modernbert-base-unified-final) 0.9176 0.8432 0.9895 0.9373
This model 0.9407 0.8978 0.9933 0.9393

+2.31 points over the published base checkpoint (+2.5% relative). The answer_mismatch bottleneck class improves from 0.8432 โ†’ 0.8978 (+6.5% rel).

Usage

Two-segment tokenization (prompt + answer); predict per answer token:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tok = AutoTokenizer.from_pretrained("etomoscow/tool-calling-hallucination-modernbert-large")
model = AutoModelForTokenClassification.from_pretrained("etomoscow/tool-calling-hallucination-modernbert-large").eval()

prompt = "<conversation context>"      # System: ... \n\n User: ... \n\n Tool: ...
answer = "<final answer>"
enc = tok(prompt, answer, truncation="only_first", max_length=4096, return_offsets_mapping=True, return_tensors="pt")
offsets = enc.pop("offset_mapping")
with torch.no_grad():
    preds = torch.argmax(model(**enc).logits, -1)[0]
# tokens where preds==1 are hallucination spans (map offsets back to answer chars)

Training

  • Backbone: answerdotai/ModernBERT-large (fine-tuned from the general backbone).
  • Data: full s-nlp/toolace-unified-hallucinations train split; answer_mismatch oversampled 3ร— (targets the bottleneck class).
  • LR 5e-5, cosine, effective batch 16, bf16, flash-attention 2, 8 epochs, max_len 4096.
  • Labels: per answer token, 1 if inside a gold hallucination span else 0.

License & attribution

MIT. Trained on s-nlp/toolace-unified-hallucinations. Built to improve on the s-nlp/tool-calling-hallucination-modernbert-base-unified-final baseline.

Downloads last month
17
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for etomoscow/tool-calling-hallucination-modernbert-large

Finetuned
(302)
this model