arxiv:2209.07288

Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping

Published on Sep 15, 2022

Authors:

Abstract

In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the initialization of the Q-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2209.07288 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2209.07288 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2209.07288 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.