arxiv:2411.14251

Natural Language Reinforcement Learning

Published on Nov 21

· Submitted by

Benjamin-eecs on Nov 22

Upvote

Authors:

Bo Liu ,

Abstract

Reinforcement Learning (RL) mathematically formulates decision-making with Markov Decision Process (MDP). With MDPs, researchers have achieved remarkable breakthroughs across various domains, including games, robotics, and language models. This paper seeks a new possibility, Natural Language Reinforcement Learning (NLRL), by extending traditional MDP to natural language-based representation space. Specifically, NLRL innovatively redefines RL principles, including task objectives, policy, value function, Bellman equation, and policy iteration, into their language counterparts. With recent advancements in large language models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value improvement by either pure prompting or gradient-based training. Experiments over Maze, Breakthrough, and Tic-Tac-Toe games demonstrate the effectiveness, efficiency, and interpretability of the NLRL framework among diverse use cases. Our code will be released at https://github.com/waterhorse1/Natural-language-RL.

View arXiv page View PDF Add to collection

Community

Benjamin-eecs

Paper author Paper submitter about 11 hours ago

•

edited about 10 hours ago

NLRL expands the scope of general sequential decision-making by moving beyond scalar rewards to leverage rich multimodal signals, particularly natural language. This approach enables agents to generalize across tasks and domains while generating high-quality interaction data. Though exemplified with language tasks, NLRL is a versatile framework that can scale to general decision-making scenarios in various modalities, improving interpretability and efficiency in solving complex sequential tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.14251 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.14251 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.14251 in a Space README.md to link it from this page.