Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
gsarti 
posted an update Jan 31
Post
🔍 Today's pick in Interpretability & Analysis of LMs: In-Context Language Learning: Architectures and Algorithms by E. Akyürek, B. Wang, Y. Kim, J. Andreas

This work methodically evaluates of in-context learning on formal languages across several model architectures, showing that Transformers outperform all other recurrent and convolutional models, including SSMs. These results are attributed to the presence of “n-gram heads” able to retrieve
the token following a context already seen in the current context window and copy it. This idea is further supported by a better ability of Transformer models to encode in-context n-gram frequencies for n>1, and a higher similarity of Transformer-based LM outputs with classical n-gram models trained on the same data. Finally, these insights are applied to the design static attention layers mimicking the behavior of n-gram head, which lead to lower perplexity despite the lower computational costs.

📄 Paper: In-Context Language Learning: Architectures and Algorithms (2401.12973)
💻 Code: https://github.com/berlino/seq_icl

Thanks a lot for sharing these papers!

deleted
This comment has been hidden
In this post