Papers
arxiv:2406.10906

Breaking the Attention Bottleneck

Published on Jun 16
· Submitted by Bachstelze on Jun 18
Authors:

Abstract

Attention-based transformers have become the standard architecture in many deep learning fields, primarily due to their ability to model long-range dependencies and handle variable-length input sequences. However, the attention mechanism with its quadratic complexity is a significant bottleneck in the transformer architecture. This algorithm is only uni-directional in the decoder and converges to a static pattern in over-parametrized decoder-only models. I address this issue by developing a generative function as attention or activation replacement. It still has the auto-regressive character by comparing each token with the previous one. In my test setting with nanoGPT this yields a smaller loss while having a smaller model. The loss further drops by incorporating an average context vector. This concept of attention replacement is distributed under the GNU AGPL v3 license at https://gitlab.com/Bachstelze/causal_generation.

Community

Paper author Paper submitter

A causal activation function is presented which could replace attention in decoders.

Thanks for sharing. FYI, I don't think you can license a concept, I'm pretty sure you need a patent for that. Copyright is for a specific expression of a concept, not the concept itself. Anyone can implement this with their own code and licence it as they see fit.

·
Paper author

I am considering applying for a patent. However, in my legal understanding, the reimplementation doesn't change the copyright. Only a new height of creation changes the license, then it isn't a reimplementation anymore. The mission statement of free and human development is going to be enforced by law.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.10906 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.10906 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.10906 in a Space README.md to link it from this page.

Collections including this paper 4