Papers
arxiv:2310.15916

In-Context Learning Creates Task Vectors

Published on Oct 24, 2023
· Featured in Daily Papers on Oct 25, 2023

Abstract

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

Community

The results here are about understanding and communicating about ICL, so this paper probably won't get the number of upvotes a paper with a more direct application would.

I think that's a pity because this paper is great and can build confidence in communicating about ICL and specializing systems to respond well to ICL.

I think this is super interesting. I love when people discover these sort of hidden mechanisms inside models. My summary...

A new paper provides some insight into how in-context learning works in LLMs. This study proposes and provides evidence for an elegant structure within the in-context learning process.

The models appear to create a "task vector" that encapsulates the core logic from the demonstration examples, in a way that is independent of any specific query. This vector serves as a compressed representation of the task.

A separate component then takes this task vector and a new query as inputs to generate the output, without directly referencing the original examples.

In essence:

Output = Apply(query, Learn(examples))

Where "Learn" derives the task vector from the examples, and "Apply" utilizes the vector and query to produce the output.

The researchers validated this hypothesis by testing major public models on diverse tasks such as translation and algorithmic reasoning. Key findings:

  • Isolating the Learn and Apply components maintained high accuracy, demonstrating the viability of the separation.
  • Task vectors clustered by task and remained consistent within tasks, indicating they encode meaningful task representations.
  • Injecting another task's vector into the model caused it to override contradictory examples and follow the vector, highlighting the vector's dominance.
  • Vectors induced relevant token distributions despite those terms being absent from the examples, suggesting semantic encoding of the task.

Taken together, these results provide substantial evidence that in-context learning involves creating a task vector that encapsulates the examples' logic to then guide behavior on new queries.

While open questions remain regarding implementation details, this is a significant step towards demystifying an interesting AI capability.

Full writeup

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

interesting find. Hard prompt is still import for non training case

would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.

Paper author

We're in the process of cleaning up the code and plan to release it shortly. Stay tuned!

Paper author

would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.

Yes, that's what we're pointing to in our paper when we mention that our findings may have practical implications for the efficient adaptation of LLMs to perform specific tasks.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.15916 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.15916 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.15916 in a Space README.md to link it from this page.

Collections including this paper 14