Spaces:

auto-academic
/

auto-draft

Running

App Files Files Community

auto-draft / outputs /outputs_20230420_235048 /related works.tex

sc_ma

Add auto_backgrounds.

238735e over 1 year ago

3.9 kB

	\section{related works}
	\paragraph{Deep Reinforcement Learning in General}
	Deep reinforcement learning (DRL) combines the powerful representation of deep neural networks with the reinforcement learning framework, enabling remarkable successes in various domains such as finance, medicine, healthcare, video games, robotics, and computer vision \cite{2108.11510}. DRL algorithms, such as Deep Q-Network (DQN) \cite{1708.05866}, Trust Region Policy Optimization (TRPO) \cite{1708.05866}, and Asynchronous Advantage Actor-Critic (A3C) \cite{1708.05866}, have shown significant advancements in solving complex problems. A comprehensive analysis of the theoretical justification, practical limitations, and empirical properties of DRL algorithms can be found in the work of \cite{1906.10025}.

	\paragraph{Playing Atari Games with DRL}
	DRL has been particularly successful in playing Atari games, where agents learn to play video games directly from pixels \cite{1708.05866}. One of the first DRL agents that learned to beat Atari games with the aid of natural language instructions was introduced in \cite{1704.05539}, which used a multimodal embedding between environment observations and natural language to self-monitor progress. Another study \cite{1809.00397} explored the use of DRL agents to transfer knowledge from one environment to another, leveraging the A3C architecture to generalize a target game using an agent trained on a source game in Atari.

	\paragraph{Sample Efficiency and Distributed DRL}
	Despite its success, DRL suffers from data inefficiency due to its trial and error learning mechanism. Several methods have been developed to address this issue, such as environment modeling, experience transfer, and distributed modifications \cite{2212.00253}. Distributed DRL, in particular, has shown potential in various applications, such as human-computer gaming and intelligent transportation \cite{2212.00253}. A review of distributed DRL methods, important components for efficient distributed learning, and toolboxes for realizing distributed DRL without significant modifications can be found in \cite{2212.00253}.

	\paragraph{Mask Atari for Partially Observable Markov Decision Processes}
	A recent benchmark called Mask Atari has been introduced to help solve partially observable Markov decision process (POMDP) problems with DRL-based approaches \cite{2203.16777}. Mask Atari is constructed based on Atari 2600 games with controllable, moveable, and learnable masks as the observation area for the target agent, providing a challenging and efficient benchmark for evaluating methods focusing on POMDP problems \cite{2203.16777}.

	\paragraph{MinAtar: Simplified Atari Environments}
	To focus more on the behavioral challenges of DRL, MinAtar has been introduced as a set of simplified Atari environments that capture the general mechanics of specific Atari games while reducing the representational complexity \cite{1903.03176}. MinAtar consists of analogues of five Atari games and provides the agent with a 10x10xn binary state representation, allowing for experiments with significantly less computational expense \cite{1903.03176}. This simplification enables researchers to thoroughly investigate behavioral challenges similar to those inherent in the original Atari environments.

	\paragraph{Expert Q-learning}
	Expert Q-learning is a novel algorithm for DRL that incorporates semi-supervised learning into reinforcement learning by splitting Q-values into state values and action advantages \cite{2106.14642}. The algorithm uses an expert network in addition to the Q-network and has been shown to be more resistant to overestimation bias and more robust in performance compared to the baseline Q-learning algorithm \cite{2106.14642}. This approach demonstrates the potential for integrating state values from expert examples into DRL algorithms for improved performance.