\section{related works} \paragraph{Reinforcement Learning and Q-Learning} Reinforcement learning is a learning paradigm for solving sequential decision-making problems, and Q-learning is one of its fundamental algorithms \cite{2009.07888}. The Q-learning algorithm, however, is known to suffer from maximization bias, which leads to the overestimation of action values \cite{2012.01100}. To address this issue, Double Q-learning has been proposed, which mitigates the overestimation problem but may result in slower convergence and increased memory requirements \cite{2303.08631}. Another approach to tackle the maximization bias is Self-correcting Q-learning, which balances the overestimation and underestimation issues while maintaining similar convergence guarantees as Q-learning \cite{2012.01100}. \paragraph{Deep Reinforcement Learning} Deep reinforcement learning (DRL) combines reinforcement learning with deep neural networks to tackle more complex problems \cite{2108.11510}. DRL has been successfully applied in various domains, including computer vision, where it has been used for tasks such as landmark localization, object detection, object tracking, image registration, image segmentation, and video analysis \cite{2108.11510}. Despite its success, DRL suffers from data inefficiency due to its trial-and-error learning mechanism, leading to the development of various sample-efficient methods, such as distributed deep reinforcement learning \cite{2212.00253}. \paragraph{Transfer Learning in Reinforcement Learning} Transfer learning has emerged as a promising approach to address the challenges faced by reinforcement learning, such as data inefficiency, by transferring knowledge from external sources to facilitate the learning process \cite{2009.07888}. A systematic investigation of transfer learning approaches in the context of deep reinforcement learning has been conducted, categorizing these approaches based on their goals, methodologies, compatible reinforcement learning backbones, and practical applications \cite{2009.07888}. \paragraph{Policy Gradient Methods} Policy gradient methods are widely used in reinforcement learning, particularly for continuous action settings. Natural policy gradients have been proposed as a more efficient alternative to traditional policy gradients, forming the foundation of contemporary reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) \cite{2209.01820}. Off-policy policy gradient methods have also been developed, with the introduction of Actor Critic with Emphatic weightings (ACE), which addresses the issues of previous off-policy policy gradient methods like OffPAC and DPG \cite{1811.09013}. \paragraph{Group-Agent Reinforcement Learning} Group-agent reinforcement learning has been proposed as a new type of reinforcement learning problem, distinct from single-agent and multi-agent reinforcement learning \cite{2202.05135}. In this scenario, multiple agents perform separate reinforcement learning tasks cooperatively, sharing knowledge without any cooperative or competitive behavior as a learning outcome. The Decentralised Distributed Asynchronous Learning (DDAL) framework has been introduced as the first distributed reinforcement learning framework designed for group-agent reinforcement learning, showing desirable performance and good scalability \cite{2202.05135}.