Papers
arxiv:2303.08059

Fast Rates for Maximum Entropy Exploration

Published on Mar 14, 2023
Authors:
,
,
,
,
,
,
,

Abstract

We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has mathcal{O}(H^3S^2A/varepsilon^2) sample complexity thus improving the varepsilon-dependence upon existing results, where S is a number of states, A is a number of actions, H is an episode length, and varepsilon is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order mathcal{O}(poly(S,A,H)/varepsilon). Interestingly, it is the first theoretical result in RL literature that establishes the potential statistical advantage of regularized MDPs for exploration. Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to mathcal{O}(H^2SA/varepsilon^2), yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2303.08059 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2303.08059 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2303.08059 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.