arxiv:2504.04022

Rethinking Reflection in Pre-Training

Published on Apr 5

· Submitted by

Research-EAI on Apr 8

Upvote

Authors:

Essential AI ,

Darsh J Shah ,

Peter Rushton ,

Somanshu Singla ,

Mohit Parmar ,

Kurt Smith ,

Yash Vanjani ,

Ashish Vaswani ,

Anthony Polloreno ,

Ashish Tanwer ,

Burhan Drak Sibai ,

Divya S Mansingka ,

Divya Shivaprasad ,

Ishaan Shah ,

Karl Stratos ,

Khoi Nguyen ,

Michael Pust

Abstract

A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.

View arXiv page View PDF GitHub repository Add to collection

Community

Research-EAI

Paper author Paper submitter 5 days ago

While most recent research has focused on how this ability develops during
reinforcement learning, we show that it actually begins to emerge much earlier—during the
model’s pre-training.

librarian-bot

5 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

yjernite

4 days ago

Great use of training checkpoints for a fully open model! Congrats @Research-EAI and @allenai :)

eliebak

4 days ago

Very cool research, you can also tried our smollm2 checkpoint if it make sense https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints 👀

dwidlee

4 days ago

BTW, finding reflection from pre-training stage is something new? I mean self-critique in Constitutional AI paper from Anthropic was the first paper that have demonstrated the capability of reflection (or self critique without post-training

Research-EAI

Paper author 3 days ago

Thank you very much for your comment. Constitutional AI is a very good paper on using a short list of human principles or instructions to improve model safety. One of the main goals of the work is to scale supervision to efficiently supervise AI. They demonstrate how RLAIF is far more effective than RLHF at aligning a base model. This is separate from our work, where we show that models of varying parameters (much smaller than the 52B considered in the Constitutional AI work) and pre-training compute can reflect and self-correct without any alignment/post-training strategies.

The final part of our Introduction section enumerates our novel contributions.