Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
gsartiΒ 
posted an update Feb 2
Post
πŸ” Today's pick in Interpretability & Analysis of LMs: ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models by @casszhao and B. Shan

Authors propose Recursive Attribution Generation (ReAGent), a perturbation-based feature attribution approach specifically conceived for generative LMs. The method employs a lightweight encoder LM to replace sampled input spans with valid alternatives and measure the effect of the perturbation on the drop in next token probability predictions. ReAGent is shown to consistentlyoutperform other established approaches across several models and generation tasks in terms of token- and sentence-level faithfulness.

πŸ“„ Paper: ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models (2402.00794)
πŸ’» Code: https://github.com/casszhao/ReAGent
In this post