Model Based Reinforcement Learning (MBRL)
Model-based reinforcement learning only differs from its model-free counterpart in learning a dynamics model, but that has substantial downstream effects on how the decisions are made.
The dynamics model usually models the environment transition dynamics, , but things like inverse dynamics models (mapping from states to actions) or reward models (predicting rewards) can be used in this framework.
Simple definition
- There is an agent that repeatedly tries to solve a problem, accumulating state and action data.
- With that data, the agent creates a structured learning tool, a dynamics model, to reason about the world.
- With the dynamics model, the agent decides how to act by predicting the future.
- With those actions, the agent collects more data, improves said model, and hopefully improves future actions.
Academic definition
Model-based reinforcement learning (MBRL) follows the framework of an agent interacting in an environment, learning a model of said environment, and then **leveraging the model for control (making decisions).
Specifically, the agent acts in a Markov Decision Process (MDP) governed by a transition function and returns a reward at each step . With a collected dataset , the agent learns a model, to minimize the negative log-likelihood of the transitions.
We employ sample-based model-predictive control (MPC) using the learned dynamics model, which optimizes the expected reward over a finite, recursively predicted horizon, , from a set of actions sampled from a uniform distribution , (see paper or paper or paper).
Further reading
For more information on MBRL, we recommend you check out the following resources:
Author
This section was written by Nathan Lambert