The Decision Transformer model was introduced by “Decision Transformer: Reinforcement Learning via Sequence Modeling” by Chen L. et al. It abstracts Reinforcement Learning as a conditional-sequence modeling problem.
The main idea is that instead of training a policy using RL methods, such as fitting a value function, that will tell us what action to take to maximize the return (cumulative reward), we use a sequence modeling algorithm (Transformer) that, given a desired return, past states, and actions, will generate future actions to achieve this desired return. It’s an autoregressive model conditioned on the desired return, past states, and actions to generate future actions that achieve the desired return.
This is a complete shift in the Reinforcement Learning paradigm since we use generative trajectory modeling (modeling the joint distribution of the sequence of states, actions, and rewards) to replace conventional RL algorithms. This means that in Decision Transformers, we don’t maximize the return but rather generate a series of future actions that achieve the desired return.
The 🤗 Transformers team integrated the Decision Transformer, an Offline Reinforcement Learning method, into the library as well as the Hugging Face Hub.
To learn more about Decision Transformers, you should read the blogpost we wrote about it Introducing Decision Transformers on Hugging Face
Now that you understand how Decision Transformers work thanks to Introducing Decision Transformers on Hugging Face, you’re ready to learn to train your first Offline Decision Transformer model from scratch to make a half-cheetah run.
Start the tutorial here 👉 https://huggingface.co/blog/train-decision-transformers
For more information, we recommend that you check out the following resources:
This section was written by Edward Beeching