Post
2159
๐ Today's pick in Interpretability & Analysis of LMs: ReFT: Representation Finetuning for Language Models by
@zhengxuanzenwu
@aryaman
Z. Wang
@atticusg
D. Jurafsky
@manning
@cgpotts
This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The ๐ค-compatible pyreft library is introduced to simplify ReFT usage.
This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!
๐ Paper: ReFT: Representation Finetuning for Language Models (2404.03592)
๐ All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9
This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The ๐ค-compatible pyreft library is introduced to simplify ReFT usage.
This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!
๐ Paper: ReFT: Representation Finetuning for Language Models (2404.03592)
๐ All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9