Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
santiviquez 
posted an update Feb 16
Post
So, I have this idea to (potentially) improve uncertainty quantification for LLM hallucination detection.

The premise is that not all output tokens of a generated response share the same importance. Hallucinations are more dangerous in the form of a noun, date, number, etc.

The idea is to have a "token selection" layer that filters the output token probabilities sequence. Then, we use only the probabilities of the relevant tokens to calculate uncertainty quantification metrics.

The big question is how we know which tokens are the relevant ones. 🤔

My idea is to use the output sequence (decoded one) and use an NLP model (it doesn't need to be a fancy one) to do entity recognition and part-of-speech tagging to the output sequence and then do uncertainty quantification only on the entities that we have set as relevant (nouns, dates, numbers, etc).

What are your thoughts? Have you seen anyone try this before?

Curious to see if anyone has tried this before and know if this would have an impact on the correlation with human-annotated evaluations.

@gsarti curious to know if you have seen something like this. It is very similar to a weighted version of UQ, but not exactly... haha

·

Hey @santiviquez , this is quite similar to what we propose in the Context-sensitive Token Identification (CTI) our PECoRe framework (https://openreview.net/forum?id=XTHfNGI3zT), with the main difference that you define "salient" anything matching some heuristic (e.g. NER/POS), while for us the relevance is given by how the generated token probability is impacted by the presence/absence of context.

I'll make an ad-hoc post about it as soon as we have a demo, but the method is also integrated in the CLI our Inseq toolkit as inseq attribute-context: https://inseq.org/en/latest/main_classes/cli.html#attribute-context
image.png