ssurface/tool-calling-hallucination-modernbert-base-unified-final

Fine-tuned answerdotai/ModernBERT-base for hallucination detection in tool-calling LLM assistants. Trained on Unified ToolACE dataset (100% of data, effective batch size 8) using the LettuceDetect framework.

The model predicts character-level hallucination spans in an assistant's answer given a tool-calling context (user query + tool response).

Quick start

import sys
sys.path.insert(0, "LettuceDetect")   # clone from github.com/KRLabsOrg/LettuceDetect
from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="ssurface/tool-calling-hallucination-modernbert-base-unified-final",
)

# Single example
result = detector.predict(
    context="{"name": "get_weather", "results": {"temp": 72, "city": "NYC"}}",
    question="What is the weather in NYC?",
    answer="The temperature in NYC is 85°F.",   # hallucinated — model said 72
)
print(result)
# [({'start': 34, 'end': 39, 'label': 'hallucination'})]

Local vLLM inference (no API key needed)

The detector runs on CPU or GPU. For batch evaluation:

python scripts/evaluate_model.py \
    --model ssurface/tool-calling-hallucination-modernbert-base-unified-final \
    --data  lettucedetect_data/tool_calling_hallucination.json \
    --split test --by-type

Training

python LettuceDetect/scripts/train.py \
    --ragtruth-path lettucedetect_data/tool_calling_hallucination.json \
    --model-name answerdotai/ModernBERT-base \
    --output-dir results/ \
    --batch-size 1 --grad-accum 8 \
    --epochs 6 --learning-rate 1e-5

Downloads last month: 52

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ssurface/tool-calling-hallucination-modernbert-base-unified-final

Base model

answerdotai/ModernBERT-base

Finetuned

(1298)

this model