Papers
arxiv:2406.14213

Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

Published on Jun 20
ยท Submitted by alsu-sagirova on Jun 24
#3 Paper of the day

Abstract

Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts. This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. Such working memory enhances the quality of model predictions in machine translation task and works as a neural-symbolic representation of information that is important for the model to make correct translations. The study of memory content revealed that translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text. Also, the diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.

Community

Paper author Paper submitter
โ€ข
edited about 1 month ago

We trained the Transformer to generate tokens in symbolic working memory to improve the machine translation and investigated the relevance of the memory content to the task.

image.png

Paper author Paper submitter

Our main findings
๐Ÿ“Š The variety of tokens in memory is higher for more complex inputs but decreases after tuning on a subtask.
๐ŸŽฏ Memory sequences more frequently store content words and functional parts of speech when working on more challenging texts like Winograd schemas or IT docs.

I love the paper. I have been thinking along similar lines--giving an LLM the ability to annotate its work--but don't have the technical capacity to pull off an experiment like this. Your implementation, particularly the way you did training, is really elegant. Do you plan on releasing the model and code?

ยท
Paper author

@realbenpope Thank you for your interest and warm words!
The method implementation and training code are available here: https://github.com/Aloriosa/gen_work_mem

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.14213 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.14213 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.14213 in a Space README.md to link it from this page.

Collections including this paper 3