@gsarti on Hugging Face: "🔍 Today's pick in Interpretability & Analysis of LMs: Do language models plan…"

Post

2155

🔍 Today's pick in Interpretability & Analysis of LMs: Do language models plan ahead for future tokens? by W. Wu @jxm @lionellevine

This work aims to evaluate whether language models exhibit implicit planning during generation.

Authors propose two hypotheses that could result in planning-like behavior: 
- Pre-caching: the model engages in computation that is functional to future, but not current, predictions. 
- Breadcrumbs: Features contributing to the current prediction happen to also be the ones improving future ones.

To validate which behavior is observed in practice, authors note that off-diagonal gradients for weight matrices across the model are the ones responsible for pre-caching, and craft a variant of gradient descent (myopic descent) to remove such terms from the optimization procedure.

Using a synthetic dataset, authors demonstrate that pre-caching occurs in Transformers language models. However, for natural language settings the LM is observed to leverage breadcrumbs from previous passes even in the case of myopic training, rendering the latter hypothesis more plausible to account for model behavior.

📄 Paper: Do language models plan ahead for future tokens? (2404.00859)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

Join the conversation