ModernBERT-large โ Tool-Calling Hallucination Span Detector
Fine-tuned ModernBERT-large for token-level hallucination span detection in
LLM tool-calling answers. Detects which answer spans are hallucinated
(answer_mismatch / missing_tool / overgeneration).
Result
Span-level IoU F1 (greedy matching, IoU โฅ 0.5) on the
s-nlp/toolace-unified-hallucinations test split:
| Model | Span F1 | answer_mismatch | missing_tool | overgeneration |
|---|---|---|---|---|
Published baseline (s-nlp/tool-calling-hallucination-modernbert-base-unified-final) |
0.9176 | 0.8432 | 0.9895 | 0.9373 |
| This model | 0.9407 | 0.8978 | 0.9933 | 0.9393 |
+2.31 points over the published base checkpoint (+2.5% relative). The
answer_mismatch bottleneck class improves from 0.8432 โ 0.8978 (+6.5% rel).
Usage
Two-segment tokenization (prompt + answer); predict per answer token:
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tok = AutoTokenizer.from_pretrained("etomoscow/tool-calling-hallucination-modernbert-large")
model = AutoModelForTokenClassification.from_pretrained("etomoscow/tool-calling-hallucination-modernbert-large").eval()
prompt = "<conversation context>" # System: ... \n\n User: ... \n\n Tool: ...
answer = "<final answer>"
enc = tok(prompt, answer, truncation="only_first", max_length=4096, return_offsets_mapping=True, return_tensors="pt")
offsets = enc.pop("offset_mapping")
with torch.no_grad():
preds = torch.argmax(model(**enc).logits, -1)[0]
# tokens where preds==1 are hallucination spans (map offsets back to answer chars)
Training
- Backbone:
answerdotai/ModernBERT-large(fine-tuned from the general backbone). - Data: full
s-nlp/toolace-unified-hallucinationstrain split; answer_mismatch oversampled 3ร (targets the bottleneck class). - LR 5e-5, cosine, effective batch 16, bf16, flash-attention 2, 8 epochs, max_len 4096.
- Labels: per answer token, 1 if inside a gold hallucination span else 0.
License & attribution
MIT. Trained on s-nlp/toolace-unified-hallucinations. Built to improve on the
s-nlp/tool-calling-hallucination-modernbert-base-unified-final baseline.
- Downloads last month
- 17
Model tree for etomoscow/tool-calling-hallucination-modernbert-large
Base model
answerdotai/ModernBERT-large