Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
gsartiΒ 
posted an update Jan 19
Post
πŸ” Today's pick in Interpretability & Analysis of LMs: Can Large Language Models Explain Themselves? by @andreasmadsen Sarath Chandar & @sivareddyg

LLMs can provide wrong but convincing explanations for their behavior, and this might lead to ill-placed confidence in their predictions. This study uses self-consistency checks to measure the faithfulness of LLM explanations: if an LLM says a set of words is important for making a prediction, then it should not be able to make the same prediction without these words. Results demonstrate that LLM self-explanations faithfulness of self-explanations cannot be reliably trusted, as they prove to be very task and model dependent, with bigger model generally producing more faithful explanations.

πŸ“„ Paper: Can Large Language Models Explain Themselves? (2401.07927)

i quite like using specialized models to test them out too https://huggingface.co/spaces/TeamTonic/hallucination-test