@gsarti on Hugging Face: "🔍 Today's pick in Interpretability & Analysis of LMs: INSIDE: LLMs' Internal…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

gsarti

posted an update Feb 7

Post

🔍 Today's pick in Interpretability & Analysis of LMs: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection by @chaochen et al.

Previous efforts in detecting hallucinations using model intrinsic information employed predictive uncertainty or self-consistency to detect evaluation. Authors contend that in these procedure the rich semantic information captured in model embeddings is inevitably lost while decoding tokens.

To prevent this information loss they propose EigenScore, an internal measure of responses’ self-consistency using the eigenvalues of sampled responses' covariance matrix in intermediate model layers to quantify answers’ diversity in the dense embedding space.  Results show that EigenScore outperforms logit-level methods for hallucination detection on QA tasks, especially when paired with inference time feature clipping to truncate extreme activations, reducing overconfident generations.

📄 Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)

Deema

Feb 9

could not find their code?

gsarti

Feb 9

Not released yet afaik!

In this post