arxiv:2310.01798

Large Language Models Cannot Self-Correct Reasoning Yet

Published on Oct 3, 2023

· Featured in Daily Papers on Oct 4, 2023

Upvote

Authors:

Jie Huang ,

Swaroop Mishra ,

Denny Zhou

Abstract

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance might even degrade post self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.

View arXiv page View PDF Add to collection

Community

mikelabs

Oct 5, 2023

•

edited Oct 5, 2023

My summary: Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively.

The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too).

More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF).

Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut.

The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road.

TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research.

Full summary