Deep RL Course documentation

Model Based Reinforcement Learning (MBRL)

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Model Based Reinforcement Learning (MBRL)

Model-based reinforcement learning only differs from its model-free counterpart in learning a dynamics model, but that has substantial downstream effects on how the decisions are made.

The dynamics model usually models the environment transition dynamics, st+1=fθ(st,at) s_{t+1} = f_\theta (s_t, a_t) , but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.

Simple definition

  • There is an agent that repeatedly tries to solve a problem, accumulating state and action data.
  • With that data, the agent creates a structured learning tool, a dynamics model, to reason about the world.
  • With the dynamics model, the agent decides how to act by predicting the future.
  • With those actions, the agent collects more data, improves said model, and hopefully improves future actions.

Academic definition

Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, learning a model of said environment, and then **leveraging the model for control (making decisions).

Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function st+1=f(st,at) s_{t+1} = f (s_t , a_t) and returns a reward at each step r(st,at) r(s_t, a_t) . With a collected dataset D:=si,ai,si+1,ri D :={ s_i, a_i, s_{i+1}, r_i} , the agent learns a model, st+1=fθ(st,at) s_{t+1} = f_\theta (s_t , a_t) to minimize the negative log-likelihood of the transitions.

We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, τ \tau , from a set of actions sampled from a uniform distribution U(a) U(a) , (see paper or paper or paper).

Further reading

For more information on MBRL, we recommend you check out the following resources:


This section was written by Nathan Lambert