Post
Eigenvalues to the rescue? ππ€
I found out about this paper thanks to @gsarti 's post from last week; I got curious, so I want to post my take on it. π€
The paper proposes a new metric called EigenScore to detect LLM hallucinations. π
Their idea is that given an input question, they generate K different answers, take their internal embedding states, calculate a covariance matrix with them, and use it to calculate an EigenScore.
We can think of the EigenScore as the mean of the eigenvalues of the covariance matrix of the embedding space of the K-generated answers.
βBut why eigenvalues?
Well, if the K generations have similar semantics, the sentence embeddings will be highly correlated, and most eigenvalues will be close to 0.
On the other hand, if the LLM hallucinates, the K generations will have diverse semantics, and the eigenvalues will be significantly different from 0.
The idea is pretty neat and shows better results when compared to other methods like sequence probabilities, length-normalized entropy, and other uncertainty quantification-based methods.
π What I'm personally missing from the paper is that they don't compare their results with other methods like LLM-Eval and SelfcheckGPT. They do mention that EigenScore is much cheaper to implement than SelfcheckGPT, but that's all on the topic.
Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)
I found out about this paper thanks to @gsarti 's post from last week; I got curious, so I want to post my take on it. π€
The paper proposes a new metric called EigenScore to detect LLM hallucinations. π
Their idea is that given an input question, they generate K different answers, take their internal embedding states, calculate a covariance matrix with them, and use it to calculate an EigenScore.
We can think of the EigenScore as the mean of the eigenvalues of the covariance matrix of the embedding space of the K-generated answers.
βBut why eigenvalues?
Well, if the K generations have similar semantics, the sentence embeddings will be highly correlated, and most eigenvalues will be close to 0.
On the other hand, if the LLM hallucinates, the K generations will have diverse semantics, and the eigenvalues will be significantly different from 0.
The idea is pretty neat and shows better results when compared to other methods like sequence probabilities, length-normalized entropy, and other uncertainty quantification-based methods.
π What I'm personally missing from the paper is that they don't compare their results with other methods like LLM-Eval and SelfcheckGPT. They do mention that EigenScore is much cheaper to implement than SelfcheckGPT, but that's all on the topic.
Paper: INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection (2402.03744)