Diff Interpretation Tuning

https://arxiv.org/abs/2510.05092

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ttw updated a model about 2 months ago

diff-interpretation-tuning/loras

ttw updated a Space about 2 months ago

diff-interpretation-tuning/README

ttw updated a dataset about 2 months ago

diff-interpretation-tuning/finetuning-data

View all activity

Papers

Learning to Interpret Weight Differences in Language Models

View all Papers

Organization Card

Community About org cards

Diff Interpretation Tuning

This organization hosts the weight diffs, DIT adapters, and finetuning data used in paper Learning to Interpret Weight Differences in Language Models (Goel et al. 2025). The paper introduces Diff Interpretation Tuning, a method that trains a LoRA adapter than can be applied to a model to get it to describe its own finetuning induced modifications.

Paper | Blogpost | Code | Demo Notebook

Diff Interpretation Tuning

AI & ML interests

Recent Activity

Papers

Diff Interpretation Tuning

models 1

diff-interpretation-tuning/loras

datasets 1

diff-interpretation-tuning/finetuning-data

AI & ML interests

Recent Activity

Papers

Team members 2

Diff Interpretation Tuning

models 1

datasets 1