Papers
arxiv:2212.10403

Towards Reasoning in Large Language Models: A Survey

Published on Dec 20, 2022
Authors:

Abstract

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.

Community

Proposes an overview of what is reasoning (deductive - based on premises, inductive - conclusion based on observations, abductive - conclusion based on explanation of observations/requires working knowledge of observation, formal - structured, and informal - unstructured/open ended), LLM reasoning methods, evaluations and analysis, and some future direction for LLM reasoning research. Fully supervised fine-tuning: You could fine-tune on a dataset containing explicit reasoning. LLM reasoning methods by prompting (ICL - in-context learning): chain-of-though (CoT), zero-shot CoT (prefix “think step-by-step”), intermediate scratchpads; rationale exploration and verification (consistency checks, add scratchpad - tree-of-thought came later) augmented CoT (give rationale in each example); problem decomposition using divide and conquer techniques (least-to-most, decomposition, and successive prompting); others have cascade of LLMs for breaking query, handling sub-tasks, verification, and union. End task performance can be measured through arithmetic reasoning (MathQA, GSM8K, etc.), common sense reasoning - everyday knowledge (CSQA, StrategyQA), symbolic reasoning - form rules by symbols (Coin Flip), task-based (BIG-bench), generalization (SCAN); reasoning can be analyzed through step-by-step metrics (ROSCOE), correctness of conclusions based on premise (FOLIO). Reasoning could be an emergent ability of LLMs, CoT could elicit it, LLMs unskilled at complex reasoning. Some emergent trends in reasoning could just be because of training data and memorization (they still hallucinate). From UIUC.

Links: PapersWithCode, GitHub

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2212.10403 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2212.10403 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2212.10403 in a Space README.md to link it from this page.

Collections including this paper 1