Papers
arxiv:2212.14449

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

Published on Dec 29, 2022
Authors:
,
,
,

Abstract

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous N-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the N-player game. Instead, we show that N agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within mathcal{O}(varepsilon^{-2}) samples from a single sample trajectory without a population generative model, up to a standard O(1{N}) error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for N-agent games, proving sample complexity guarantees by only using a sample path from the N-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by N agents with finite sample guarantees.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2212.14449 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2212.14449 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2212.14449 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.