# Model Based Reinforcement Learning (MBRL)

Model-based reinforcement learning only differs from its model-free counterpart in learning a *dynamics model*, but that has substantial downstream effects on how the decisions are made.

The dynamics model usually models the environment transition dynamics, $s_{t+1} = f_\theta (s_t, a_t)$, but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.

## Simple definition

- There is an agent that repeatedly tries to solve a problem,
**accumulating state and action data**. - With that data, the agent creates a structured learning tool,
*a dynamics model*, to reason about the world. - With the dynamics model, the agent
**decides how to act by predicting the future**. - With those actions,
**the agent collects more data, improves said model, and hopefully improves future actions**.

## Academic definition

Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, **learning a model of said environment**, and then **leveraging the model for control (making decisions).

Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function $s_{t+1} = f (s_t , a_t)$ and returns a reward at each step $r(s_t, a_t)$. With a collected dataset $D :={ s_i, a_i, s_{i+1}, r_i}$, the agent learns a model, $s_{t+1} = f_\theta (s_t , a_t)$ **to minimize the negative log-likelihood of the transitions**.

We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, $\tau$, from a set of actions sampled from a uniform distribution $U(a)$, (see paper or paper or paper).

## Further reading

For more information on MBRL, we recommend you check out the following resources:

## Author

This section was written by Nathan Lambert