Papers
arxiv:2504.04022

Rethinking Reflection in Pre-Training

Published on Apr 5
· Submitted by Research-EAI on Apr 8

Abstract

A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.

Community

Paper author Paper submitter

While most recent research has focused on how this ability develops during
reinforcement learning, we show that it actually begins to emerge much earlier—during the
model’s pre-training.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Great use of training checkpoints for a fully open model! Congrats @Research-EAI and @allenai :)

Very cool research, you can also tried our smollm2 checkpoint if it make sense https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints 👀

BTW, finding reflection from pre-training stage is something new? I mean self-critique in Constitutional AI paper from Anthropic was the first paper that have demonstrated the capability of reflection (or self critique without post-training

·
Paper author

Thank you very much for your comment. Constitutional AI is a very good paper on using a short list of human principles or instructions to improve model safety. One of the main goals of the work is to scale supervision to efficiently supervise AI. They demonstrate how RLAIF is far more effective than RLHF at aligning a base model. This is separate from our work, where we show that models of varying parameters (much smaller than the 52B considered in the Constitutional AI work) and pre-training compute can reflect and self-correct without any alignment/post-training strategies.

The final part of our Introduction section enumerates our novel contributions.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.04022 in a model README.md to link it from this page.

Datasets citing this paper 6

Browse 6 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.04022 in a Space README.md to link it from this page.

Collections including this paper 6