AI & ML interests

None defined yet.

Recent Activity

ttw  updated a Space about 11 hours ago
diff-interpretation-tuning/README
ttw  updated a model 7 days ago
diff-interpretation-tuning/loras
ttw  updated a dataset 12 days ago
diff-interpretation-tuning/finetuning-data
View all activity

Diff Interpretation Tuning

This organization hosts the weight diffs, DIT adapters, and finetuning data used in paper Learning to Interpret Weight Differences in Language Models (Goel et al. 2025). The paper introduces Diff Interpretation Tuning, a method that trains a LoRA adapter than can be applied to a model to get it to describe its own finetuning induced modifications.

Paper | Blogpost | Code | Demo Notebook

Teaser image for Diff Interpretation Tuning