Model Based Reinforcement Learning (MBRL)

Model-based reinforcement learning only differs from its model-free counterpart in learning a dynamics model, but that has substantial downstream effects on how the decisions are made.

The dynamics model usually models the environment transition dynamics, $s_{t+1} = f_\theta (s_t, a_t)$ , but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.

Simple definition

There is an agent that repeatedly tries to solve a problem, accumulating state and action data.
With that data, the agent creates a structured learning tool, a dynamics model, to reason about the world.
With the dynamics model, the agent decides how to act by predicting the future.
With those actions, the agent collects more data, improves said model, and hopefully improves future actions.

Academic definition

Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, learning a model of said environment, and then **leveraging the model for control (making decisions).

Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function $s_{t+1} = f (s_t , a_t)$ and returns a reward at each step $r(s_t, a_t)$ . With a collected dataset $D :={ s_i, a_i, s_{i+1}, r_i}$ , the agent learns a model, $s_{t+1} = f_\theta (s_t , a_t)$ to minimize the negative log-likelihood of the transitions.

We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, $\tau$ , from a set of actions sampled from a uniform distribution $U(a)$ , (see paper or paper or paper).

Author

This section was written by Nathan Lambert

Deep RL Course

Model Based Reinforcement Learning (MBRL)

Simple definition

Academic definition

Further reading

Author