File size: 4,847 Bytes
2c5adce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
\section{Related Works}

\paragraph{Deep Reinforcement Learning for Atari Games}
The seminal work by \citet{mnih2013playing} introduced the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. This model outperformed all previous approaches on six of the games and surpassed a human expert on three of them. The authors later extended their work with asynchronous gradient descent for optimization of deep neural network controllers, showing success on a wide variety of continuous motor control problems and a new task of navigating random 3D mazes using a visual input \citep{mnih2016asynchronous}. However, these approaches suffer from overestimations in value function approximations, which were addressed by \citet{hasselt2015deep} through a specific adaptation to the DQN algorithm, leading to much better performance on several games.

\paragraph{Decentralized Reinforcement Learning}
Decentralized reinforcement learning has been studied in various contexts. \citet{lu2021decentralized} proposed a decentralized policy gradient (PG) method, Safe Dec-PG, to perform policy optimization based on the D-CMDP model over a network. This was the first decentralized PG algorithm that accounted for coupled safety constraints with a quantifiable convergence rate in multi-agent reinforcement learning. \citet{lei2022adaptive} introduced an adaptive stochastic incremental ADMM (asI-ADMM) algorithm for decentralized RL with edge-computing-empowered IoT networks, showing better performance in terms of communication costs and scalability compared to the state of the art. However, the work by \citet{lyu2021contrasting} highlighted misconceptions regarding centralized critics in the literature, emphasizing that both centralized and decentralized critics have different pros and cons that should be considered by algorithm designers.

\paragraph{Game Theory and Multi-Agent Reinforcement Learning}
Game theory has been widely used in combination with reinforcement learning to tackle multi-agent problems. \citet{yin2022air} proposed an algorithm based on deep reinforcement learning and game theory to solve Nash equilibrium strategy in highly competitive environments, demonstrating good convergence through simulation tests. \citet{adams2020resolving} addressed the challenges of implicit coordination in multi-agent deep reinforcement learning by combining Deep-Q Networks for policy learning with Nash equilibrium for action selection. In the context of autonomous driving, \citet{duan2022autonomous} proposed an automatic drive model based on game theory and reinforcement learning, enabling multi-agent cooperative driving with strategic reasoning and negotiation in traffic scenarios. However, these approaches often require complex computations and may not scale well to large-scale problems.

\paragraph{Decentralized Learning with Communication Constraints}
One of the challenges in decentralized learning is to handle communication constraints. \citet{kong2021consensus} showed that decentralized training converges as fast as the centralized counterpart when the training consensus distance is lower than a critical quantity, providing insights for designing better decentralized training schemes. \citet{fu2022automatic} proposed a decentralized ensemble learning framework for automatic modulation classification, reducing communication overhead while maintaining similar classification performance. In the context of multi-agent systems, \citet{su2022ma2ql} introduced MA2QL, a minimalist approach to fully decentralized cooperative MARL with theoretical guarantees on convergence to a Nash equilibrium when each agent achieves $\varepsilon$-convergence at each turn. However, these methods may still suffer from limitations in highly dynamic and complex environments.

\paragraph{Decentralized Collision Avoidance}
Decentralized collision avoidance has been an important application of reinforcement learning. \citet{thumiger2022a} proposed an improved deep reinforcement learning controller for decentralized collision avoidance using a unique architecture incorporating long-short term memory cells and a reward function inspired by gradient-based approaches. This controller outperformed existing techniques in environments with variable numbers of agents. In the context of autonomous vehicles, \citet{ardekani2022combining} suggested a novel algorithm based on Nash equilibrium and memory neural networks for path selection in highly dynamic and complex environments, showing that the obtained response matched with Nash equilibrium in 90.2 percent of the situations during simulation experiments. However, these approaches may require extensive training and computational resources, which could be a concern in real-world applications.