Associative Recurrent Memory Transformer
Paper
•
2407.04841
•
Published
•
31
Note a neural architecture for very long sequences that requires constant time for processing new information at each time step. transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context [2R; for 持续学习?]
Note We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called Test-Time Training (TTT) layers. [2R; for 持续学习?与RNN有关]