arxiv:2406.20086

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Published on Jun 28

· Submitted by

Authors:

Abstract

LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantically meaningful units like "north" or "east." Similarly, the overall meanings of named entities like "Neil Young" and multi-word expressions like "break a leg" cannot be directly inferred from their constituent tokens. Mechanistically, how do LLMs convert such arbitrary groups of tokens into useful higher-level representations? In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced "erasure" effect, where information about previous and current tokens is rapidly forgotten in early layers. Using this observation, we propose a method to "read out" the implicit vocabulary of an autoregressive LLM by examining differences in token representations across layers, and present results of this method for Llama-2-7b and Llama-3-8B. To our knowledge, this is the first attempt to probe the implicit vocabulary of an LLM.

View arXiv page View PDF Add to collection

Community

gsarti

Paper submitter about 16 hours ago

Author thread on X: https://x.com/sheridan_feucht/status/1807844561533993227
Code: https://github.com/sfeucht/footprints/

nielsr

about 12 hours ago

Hi @gsarti congrats on this work!

Would you be interested in pushing your datasets to the hub? See here on how to do that: https://huggingface.co/docs/datasets/loading#csv. You can then simply do dataset.push_to_hub to push your datasets.

Also, would be great to then link to this paper page, see here: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper

Regarding your model, would also be great to link it to this paper page, see here on how to do that: https://huggingface.co/docs/hub/en/model-cards#linking-a-paper.

Also, download metrics currently won't work, it's recommended to use PyTorchModelHubMixin instead as explained here: https://huggingface.co/docs/hub/models-uploading#upload-a-pytorch-model-using-huggingfacehub.

Let me know if you need any help.

Cheers!

gsarti

about 3 hours ago

Hey @nielsr , this is actually not my own paper :) but I hope the authors will release their artifacts here!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.20086 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.20086 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.20086 in a Space README.md to link it from this page.