ssurface/tool-calling-hallucination-modernbert-base-unified-final
Fine-tuned answerdotai/ModernBERT-base for hallucination detection in tool-calling LLM assistants. Trained on Unified ToolACE dataset (100% of data, effective batch size 8) using the LettuceDetect framework.
The model predicts character-level hallucination spans in an assistant's answer given a tool-calling context (user query + tool response).
Quick start
import sys
sys.path.insert(0, "LettuceDetect") # clone from github.com/KRLabsOrg/LettuceDetect
from lettucedetect.models.inference import HallucinationDetector
detector = HallucinationDetector(
method="transformer",
model_path="ssurface/tool-calling-hallucination-modernbert-base-unified-final",
)
# Single example
result = detector.predict(
context="{"name": "get_weather", "results": {"temp": 72, "city": "NYC"}}",
question="What is the weather in NYC?",
answer="The temperature in NYC is 85°F.", # hallucinated — model said 72
)
print(result)
# [({'start': 34, 'end': 39, 'label': 'hallucination'})]
Local vLLM inference (no API key needed)
The detector runs on CPU or GPU. For batch evaluation:
python scripts/evaluate_model.py \
--model ssurface/tool-calling-hallucination-modernbert-base-unified-final \
--data lettucedetect_data/tool_calling_hallucination.json \
--split test --by-type
Training
python LettuceDetect/scripts/train.py \
--ragtruth-path lettucedetect_data/tool_calling_hallucination.json \
--model-name answerdotai/ModernBERT-base \
--output-dir results/ \
--batch-size 1 --grad-accum 8 \
--epochs 6 --learning-rate 1e-5
- Downloads last month
- 52
Model tree for ssurface/tool-calling-hallucination-modernbert-base-unified-final
Base model
answerdotai/ModernBERT-base