niklasstoehr (Niklas Stoehr)

New activity in llamafactory/tiny-random-Llama-3-valuehead about 2 months ago

Any chance you could include the value head weights?

5

#1 opened about 2 months ago by

niklasstoehr

upvoted a paper about 2 months ago

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28 • 77

upvoted a paper 9 months ago

Context versus Prior Knowledge in Language Models

Paper • 2404.04633 • Published Apr 6 • 5

reacted to gsarti's post with 👍 9 months ago

Post

2230

🔍 Today's pick in Interpretability & Analysis of LMs: Context versus Prior Knowledge in Language Models by @kdu4108 @vesteinn @niklasstoehr J. C. White A. Schein @rcotterell

This work examines the influence of context versus memorized knowledge in LMs through the lens of the shift caused by contexts at various degrees of informativeness to the models' predictive distribution. Understanding this difference is especially important in the context of knowledge conflicts between memorized and contextual information.

Authors propose disentangling context influence in terms of "persuasion", i.e. how impactful is the inclusion of the context for answers of a given query/entity pair, and "susceptibility", i.e. how much answers of a given query/entity pair are likely to be swayed by the presence of context, and operationalize these concepts using information-theoretic measures akin to mutual information.

The two metrics are validated using a synthetic dataset sourced from a knowledge graph. Analysis shows that: 
- The degree of persuasiveness of relevant contexts increases with the increase of model size (interesting implications here for the jailbreaking of LLMs!)
- assertive contexts tend to be more persuasive for closed queries (yes/no) and mid-sized models
- Negation affect context persuasiveness
- Familiar entities (explored as real vs. fake, more frequent in training data and more connected in the KG) are less susceptible to context influence

Finally, authors suggest applications of the persuasion/susceptibility framing for social science analyses and gender bias evaluation.

💻 Code: https://github.com/kdu4108/measureLM
📄 Paper: Context versus Prior Knowledge in Language Models (2404.04633)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

upvoted a paper 9 months ago

Localizing Paragraph Memorization in Language Models

Paper • 2403.19851 • Published Mar 28 • 13

Niklas Stoehr

AI & ML interests

Recent Activity

Organizations

niklasstoehr's activity

Any chance you could include the value head weights?

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Context versus Prior Knowledge in Language Models

Localizing Paragraph Memorization in Language Models