@santiviquez on Hugging Face: "Confidence * may be * all you need. A simple average of the log…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

santiviquez

posted an update Jan 29

Post

Confidence * may be * all you need.

A simple average of the log probabilities of the output tokens from an LLM might be all it takes to tell if the model is hallucinating.🫨

The idea is that if a model is not confident (low output token probabilities), the model may be inventing random stuff.

In these two papers:
1. https://aclanthology.org/2023.eacl-main.75/
2. https://arxiv.org/abs/2303.08896

The authors claim that this simple method is the best heuristic for detecting hallucinations. The beauty is that it only uses the generated token probabilities, so it can be implemented at inference time ⚡

dhuynh95

Jan 29

Love this paper too! It's simple yet powerful and applicable to black box models.
I actually have a space to demonstrate it: https://huggingface.co/spaces/mithril-security/hallucination_detector

I also dig into it on an HF Blog post: https://huggingface.co/blog/dhuynh95/automatic-hallucination-detection

santiviquez

Jan 29

Ohh that’s so cool! I actually played with the space last week when I was reading the paper. Don’t remember how I found it 🤔

gsarti

Jan 30

You might be interested in this follow-up work showing that fully intrinsic properties in the form of attribution scores outperform logprobs, especially on fully detached hallucinations, matching supervised hallucination detectors' abilities: https://aclanthology.org/2023.acl-long.3/

santiviquez

Jan 30

Nice! Thank you, I'll take a look

In this post