Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
gsartiย 
posted an update Apr 7
Post
2154
๐Ÿ” Today's pick in Interpretability & Analysis of LMs: ReFT: Representation Finetuning for Language Models by @zhengxuanzenwu @aryaman Z. Wang @atticusg D. Jurafsky @manning @cgpotts

This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The ๐Ÿค—-compatible pyreft library is introduced to simplify ReFT usage.

This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!

๐Ÿ“„ Paper: ReFT: Representation Finetuning for Language Models (2404.03592)

๐Ÿ” All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9
In this post