arxiv:2310.15916

In-Context Learning Creates Task Vectors

Published on Oct 24, 2023

· Submitted by

akhaliq on Oct 25, 2023

#1 Paper of the day

Upvote

Authors:

Roee Hendel ,

Mor Geva ,

Amir Globerson

Abstract

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

View arXiv page View PDF Add to collection

Community

mattbarr

Oct 25, 2023

The results here are about understanding and communicating about ICL, so this paper probably won't get the number of upvotes a paper with a more direct application would.

I think that's a pity because this paper is great and can build confidence in communicating about ICL and specializing systems to respond well to ICL.

mikelabs

Oct 25, 2023

•

edited Oct 25, 2023

I think this is super interesting. I love when people discover these sort of hidden mechanisms inside models. My summary...

A new paper provides some insight into how in-context learning works in LLMs. This study proposes and provides evidence for an elegant structure within the in-context learning process.

The models appear to create a "task vector" that encapsulates the core logic from the demonstration examples, in a way that is independent of any specific query. This vector serves as a compressed representation of the task.

A separate component then takes this task vector and a new query as inputs to generate the output, without directly referencing the original examples.

In essence:

Output = Apply(query, Learn(examples))

Where "Learn" derives the task vector from the examples, and "Apply" utilizes the vector and query to produce the output.

The researchers validated this hypothesis by testing major public models on diverse tasks such as translation and algorithmic reasoning. Key findings:

Isolating the Learn and Apply components maintained high accuracy, demonstrating the viability of the separation.
Task vectors clustered by task and remained consistent within tasks, indicating they encode meaningful task representations.
Injecting another task's vector into the model caused it to override contradictory examples and follow the vector, highlighting the vector's dominance.
Vectors induced relevant token distributions despite those terms being absent from the examples, suggesting semantic encoding of the task.

Taken together, these results provide substantial evidence that in-context learning involves creating a task vector that encapsulates the examples' logic to then guide behavior on new queries.

While open questions remain regarding implementation details, this is a significant step towards demystifying an interesting AI capability.

Full writeup

librarian-bot

Oct 26, 2023

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

yinnxinn

Oct 26, 2023

interesting find. Hard prompt is still import for non training case

berkecr

Oct 26, 2023

would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.

roeehendel

Paper author Oct 28, 2023

We're in the process of cleaning up the code and plan to release it shortly. Stay tuned!

roeehendel

Paper author Oct 28, 2023

would it be possible to precompute some stuff to accelerate inference? It would be interesting to see the code.

Yes, that's what we're pointing to in our paper when we mention that our findings may have practical implications for the efficient adaptation of LLMs to perform specific tasks.