GroNLP (GroNLP)

@victor unprompted feature request: I'd love to have a toggle for a HF collection to control whether new items are added to the top or to the bottom. At the moment everything gets added at the bottom, but it would be great to have newer elements on top to make fresh content easily accessible without having to scroll all the way!

3 replies

·

bylin

updated a dataset 8 months ago

GroNLP/dutch-cola

Viewer • Updated May 14 • 24.7k • 180 • 4

gsarti

posted an update 8 months ago

Post

2921

🔍 Today's (self-serving) pick in Interpretability & Analysis of LMs:

A Primer on the Inner Workings of Transformer-based Language Models
by @javifer @gsarti @arianna-bis and M. R. Costa-jussà
( @mt-upc , @GroNLP , @facebook )

This primer can serve as a comprehensive introduction to recent advances in interpretability for Transformer-based LMs for a technical audience, employing a unified notation to introduce network modules and present state-of-the-art interpretability methods.

Interpretability methods are presented with detailed formulations and categorized as either localizing the inputs or model components responsible for a particular prediction or decoding information stored in learned representations. Then, various insights on the role of specific model components are summarized alongside recent work using model internals to direct editing and mitigate hallucinations.

Finally, the paper provides a detailed picture of the open-source interpretability tools landscape, supporting the need for open-access models to advance interpretability research.

📄 Paper: A Primer on the Inner Workings of Transformer-based Language Models (2405.00208)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

gsarti

authored a paper 8 months ago

A Primer on the Inner Workings of Transformer-based Language Models

Paper • 2405.00208 • Published Apr 30 • 9

gsarti

posted an update 8 months ago

Post

2452

🔍 Today's pick in Interpretability & Analysis of LMs: by @aadityasingh T. Moskovitz, F. Hill, S. C. Y. Chan, A. M. Saxe ( @gatsbyunit )

This work proposes a new methodology inspired by optogenetics (dubbed "clamping") to perform targeted ablations during training to estimate the causal effect of specific interventions on mechanism formation.

Authors use this approach to study the formation of induction heads training a 2L attention-only transformer to label examples via context information.

Notable findings:

- The effects of induction heads are additive and redundant, with weaker heads compensating well for the ablation of a strong induction head in case the latter is ablated.
- Competition between induction heads might emerge as a product of optimization pressure to converge faster, but it is not strictly necessary as all heads eventually learn to solve the task.
- Previous token heads (PTH) influence induction heads in a many-to-many fashion, with any PTH eliciting above-chance prediction from a subsequent induction head
- Three subcircuits for induction are identified, respectively mixing token-label information (1 + 2), matching the previous occurrence of the current class in the context (3qk + 4), and copying the label of the matched class (3v + 5).
- The formation of induction heads is slowed down by a larger number of classes & labels, with more classes and more labels slowing down the formation of the matching and copying mechanisms, respectively. This may have implications when selecting a vocabulary size for LLMs: larger vocabularies lead to an increased compression ratio and longer contexts, but they might make copying more challenging by delaying the formation of induction heads.

💻 Code: https://github.com/aadityasingh/icl-dynamics

📄 Paper: What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation (2404.07129)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

gsarti

posted an update 8 months ago

Post

1773

I'm super happy to co-organize the (Mechanistic) Interpretability social at #ICLR2024 with @nikhil07prakash ! 🔍

If you plan to attend, help us make this meetup awesome by filling the form below! 😄

📅 Wed, May 8, 12:45-2:15 PM
🔗 RSVP & share your ideas here: https://forms.gle/FWap4KW2ikdntjfb8

5 replies

·

gsarti

posted an update 8 months ago

Post

2384

🔍 Today's pick in Interpretability & Analysis of LMs: LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models (2404.07004) by @igortufanov @mahnerak @javifer @lena-voita

The LLM transparency toolkit is an open source toolkit and visual interface to efficiently identify component circuits in LMs responsible for their predictions, using the Information Flow Routes approach ( Information Flow Routes: Automatically Interpreting Language Models at Scale (2403.00824)).

The tool enables fine-grained customization, highlighting the importance of individual FFN neurons and attention heads. Moreover, vocabulary projections computed using the logit lens approach are provided to examine intermediate predictions of the residual stream, and tokens promoted by specific component updates.

💻 Code: https://github.com/facebookresearch/llm-transparency-tool

🚀 Demo: facebook/llm-transparency-tool-demo

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

gsarti

posted an update 9 months ago

Post

2405

🔍 Today's pick in Interpretability & Analysis of LMs: x2 edition!

Today's highlighted works aim reproduce findings from Transformer-centric interpretability literature on new RNN-based architectures such as Mamba and RWKV:

Does Transformer Interpretability Transfer to RNNs? (2404.05971) by @MrGonao T. Marshall @norabelrose

Locating and Editing Factual Associations in Mamba (2404.03646) by @sensharma @datkinson @davidbau

The first paper applies contrastive activation addition, the tuned lens and probing for eliciting latent knowledge in quirky models to Mamba and RWKV LMs, finding these Transformer-specific methods can be applied with slight adaptation to these architectures, obtaining similar results.

The second work applies the ROME method to Mamba, finding weights playing the role of MLPs in encoding factual relations across several Mamba layers, and can be patched to perform model editing. A new SSM-specific technique is also introduced to emulate attention knockout (value zeroing) revealing information flows similar to the ones in Transformers when processing factual statements.

💻 Code: https://github.com/arnab-api/romba

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9

GroNLP

AI & ML interests

Recent Activity

GroNLP's activity

GroNLP/squad-nl-v1.1

GroNLP/squad-nl-v2.0

Citation

LM Explanation Demo

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models

The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

GroNLP/dutch-cola

A Primer on the Inner Workings of Transformer-based Language Models

AI & ML interests

Recent Activity

Team members 48

GroNLP's activity

Citation

LM Explanation Demo