Instructions to use Ali-Bhai/toolace_hallucination_deberta_v3_small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ali-Bhai/toolace_hallucination_deberta_v3_small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="Ali-Bhai/toolace_hallucination_deberta_v3_small")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("Ali-Bhai/toolace_hallucination_deberta_v3_small") model = AutoModelForTokenClassification.from_pretrained("Ali-Bhai/toolace_hallucination_deberta_v3_small") - Notebooks
- Google Colab
- Kaggle
ToolACE hallucination DeBERTa v3 small token classifier
This model is a DeBERTa v3 small token classifier trained for span level hallucination detection in tool calling responses.
Labels
normal
contradiction
overgeneration
missing_tool
Question and context tokens are ignored during training. The classifier is trained only on answer tokens.
Training data
The model was trained on the ToolACE RAGTruth style synthetic hallucination dataset.
Dataset repository:
Ali-Bhai/toolace_ragtruth_synthetic
Validation tuned thresholds
{ "contradiction": 0.9, "overgeneration": 0.0, "missing_tool": 0.0 }
Final combined test results
| method | precision | recall | f1 |
|---|---|---|---|
| Improved DeBERTa tuned thresholds | 0.976366 | 0.992562 | 0.984397 |
| Template suffix sanity baseline | 1 | 0.96786 | 0.983667 |
| Improved DeBERTa token classifier | 0.96975 | 0.992788 | 0.981134 |
| LettuceDetect transformer | 0.339035 | 0.787189 | 0.473946 |
| LookBackLens style distilgpt2 | 0.346983 | 0.655472 | 0.453761 |
| Unsupported value heuristic | 0.170562 | 0.0187523 | 0.0337895 |
| Empty prediction | 0 | 0 | 0 |
Final per type F1 table
| method | contradiction | missing_tool | overgeneration |
|---|---|---|---|
| Empty prediction | 0 | 0 | 0 |
| Improved DeBERTa token classifier | 0.67071 | 0.983388 | 0.987825 |
| Improved DeBERTa tuned thresholds | 0.700767 | 0.987462 | 0.990821 |
| LettuceDetect transformer | 0.0488308 | 0.287056 | 0.558726 |
| LookBackLens style distilgpt2 | 0.0139008 | 0.414203 | 0.450776 |
| Template suffix sanity baseline | 0 | 1 | 1 |
| Unsupported value heuristic | 0.377397 | 0 | 0.00618375 |
Loading
Install transformers, then load the model.
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_id = "Ali-Bhai/toolace_hallucination_deberta_v3_small"
tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForTokenClassification.from_pretrained(model_id)
print(model.config.id2label)
Important evaluation note
The final reported method applies validation tuned thresholds to the token classifier outputs.
Validation tuned thresholds:
{ "contradiction": 0.9, "overgeneration": 0.0, "missing_tool": 0.0 }
Caveat
This model was trained on synthetic hallucination labels. It should be evaluated carefully before use on naturally occurring production hallucinations.
- Downloads last month
- 40
Model tree for Ali-Bhai/toolace_hallucination_deberta_v3_small
Base model
microsoft/deberta-v3-small