Papers
arxiv:1706.09520

Neural SLAM: Learning to Explore with External Memory

Published on Jun 29, 2017
Authors:
,
,
,
,

Abstract

We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAM-like behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where long-term memory is essential. We validate our approach in both challenging grid-world environments and preliminary Gazebo experiments. A video of our experiments can be found at: https://goo.gl/G2Vu5y.

Community

Introduces Neural SLAM for integrating reinforcement learning with external memory architectures (that store internal representations of the environment) for exploration tasks; embeds properties of a SLAM system in a NN (differentiable formulation). Using weights of LSTM or RNN for memory compromises expressive capacity of network over long duration; use differentiable neural computer (DNC) or neural Turing machine (NTM) instead (uses NTM). Formulates the exploration task as a Markov Decision Process (MDP) where optimal policy is learned using advantage actor-critic (A3C) algorithm RL backbone with truncated generalized advantage estimator (GAE). A3C has multiple parallel workers interacting with each other and GAE weighs it exponentially (for policy network parameter gradients); shared model updated using HOGWILD. At every time step, the input (state and previous read vector) is given to an LSTM to get hidden state; projection heads (linear layers) are used to get parameters (control variables) for data association and measurement update, and add and erase vector for mapping updates (to the memory/storage) for each write head. Localization through motion model and access weights. Data association is done by comparison of a projection (from a head) with the external memory, giving content-based access weight. Measurement is updated through a gating function (with motion prediction), convolve with a shift operator/kernel, smoothen or sharpen with a scalar. Access weights are written to the map (with erase and add vectors that are like gates over previous memory). Policy gets the read vector from memory and access weights, calculates policy distribution and value estimate (linear networks) which are then used to calculate losses. Experimented on a simulated grid world environment and a gazebo environment. Neural-SLAM takes fewer steps and has better success ratio (testing) compared to A3C methods. From University of Freiburg and HKUST.

Links: PapersWithCode, YouTube, GitHub

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1706.09520 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1706.09520 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1706.09520 in a Space README.md to link it from this page.

Collections including this paper 1