Papers
arxiv:2305.19534

Recasting Self-Attention with Holographic Reduced Representations

Published on May 31, 2023
Authors:
,
,
,

Abstract

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T^2) memory and O(T^2 H) compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of T geq 100,000 are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including O(T H log H) time complexity, O(T H) space complexity, and convergence in 10times fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to 280times faster to train on the Long Range Arena benchmark. Code is available at https://github.com/NeuromorphicComputationResearchProgram/Hrrformer

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.19534 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.19534 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.19534 in a Space README.md to link it from this page.

Collections including this paper 1