ssurface/tool-calling-hallucination-modernbert-base-glaive-100pct

Fine-tuned answerdotai/ModernBERT-base for hallucination detection in tool-calling LLM assistants. Trained on Glaive (ours) (100% of data, effective batch size 8) using the LettuceDetect framework.

The model predicts character-level hallucination spans in an assistant's answer given a tool-calling context (user query + tool response).

Quick start

import sys
sys.path.insert(0, "LettuceDetect")   # clone from github.com/KRLabsOrg/LettuceDetect
from lettucedetect.models.inference import HallucinationDetector

detector = HallucinationDetector(
    method="transformer",
    model_path="ssurface/tool-calling-hallucination-modernbert-base-glaive-100pct",
)

# Single example
result = detector.predict(
    context="{"name": "get_weather", "results": {"temp": 72, "city": "NYC"}}",
    question="What is the weather in NYC?",
    answer="The temperature in NYC is 85°F.",   # hallucinated — model said 72
)
print(result)
# [({'start': 34, 'end': 39, 'label': 'hallucination'})]

Local vLLM inference (no API key needed)

The detector runs on CPU or GPU. For batch evaluation:

python scripts/evaluate_model.py \
    --model ssurface/tool-calling-hallucination-modernbert-base-glaive-100pct \
    --data  lettucedetect_data/tool_calling_hallucination.json \
    --split test --by-type

Training

python LettuceDetect/scripts/train.py \
    --ragtruth-path lettucedetect_data/tool_calling_hallucination.json \
    --model-name answerdotai/ModernBERT-base \
    --output-dir results/ \
    --batch-size 1 --grad-accum 8 \
    --epochs 6 --learning-rate 1e-5
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ssurface/tool-calling-hallucination-modernbert-base-glaive-100pct

Finetuned
(1298)
this model

Collection including ssurface/tool-calling-hallucination-modernbert-base-glaive-100pct