Abstract
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
Community
While most recent research has focused on how this ability develops during
reinforcement learning, we show that it actually begins to emerge much earlier—during the
model’s pre-training.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adaptive Rectification Sampling for Test-Time Compute Scaling (2025)
- Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time (2025)
- Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking (2025)
- The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction (2025)
- LLMs can implicitly learn from mistakes in-context (2025)
- Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs (2025)
- Evaluating Social Biases in LLM Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Very cool research, you can also tried our smollm2 checkpoint if it make sense https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints 👀
BTW, finding reflection from pre-training stage is something new? I mean self-critique in Constitutional AI paper from Anthropic was the first paper that have demonstrated the capability of reflection (or self critique without post-training
Thank you very much for your comment. Constitutional AI is a very good paper on using a short list of human principles or instructions to improve model safety. One of the main goals of the work is to scale supervision to efficiently supervise AI. They demonstrate how RLAIF is far more effective than RLHF at aligning a base model. This is separate from our work, where we show that models of varying parameters (much smaller than the 52B considered in the Constitutional AI work) and pre-training compute can reflect and self-correct without any alignment/post-training strategies.
The final part of our Introduction section enumerates our novel contributions.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 6
Browse 6 datasets citing this paperSpaces citing this paper 0
No Space linking this paper