arxiv:2310.20689

Learning From Mistakes Makes LLM Better Reasoner

Published on Oct 31, 2023

· Featured in Daily Papers on Nov 1, 2023

Upvote

Authors:

Shengnan An ,

Zeqi Lin ,

Jian-Guang Lou ,

Weizhu Chen

Abstract

Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.

View arXiv page View PDF Add to collection

Community

JoelKessels

Nov 2, 2023

If all human methods are applicable to LLM's, why not create an LLM that can adjust other LLM's based on Arxiv papers, for example, and post the suggestions to Git as code upgrades? This can do all improvements at once then, creating a higher and more efficient baseline of performance.