arxiv:2502.11196

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Published on Feb 16

· Submitted by

Ningyu on Feb 18

Upvote

Authors:

Yixin Ou ,

Yunzhi Yao ,

Ningyu Zhang ,

Jiacheng Sun ,

Shumin Deng ,

Huajun Chen

Abstract

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

View arXiv page View PDF Add to collection

Community

Ningyu

Paper author Paper submitter 4 days ago

How do LLMs acquire new knowledge? We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing.

Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to preexisting knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern.

Bachstelze

4 days ago

Are those findings decoder-only specific or general to all architectures?

Ningyu

Paper author 4 days ago

We conduct experiments with decoder-only LLMs and introduce limitations at the end of this paper.

LiraMirui

4 days ago

Useful conclusions. Point about "pre-existing is better than new knowledge" confirms my bias about curriculum learning, but this method raises the question how to put the whole internet into a curriculum.

Ningyu

Paper author 4 days ago

Useful conclusions. Point about "pre-existing is better than new knowledge" confirms my bias about curriculum learning, but this method raises the question how to put the whole internet into a curriculum.

I think focusing on iterative, domain-aware progression, rather than rigid, linear paths, could be an idea.