Trajectory Transformer
This model is in maintenance mode only, so we won’t accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.30.0.
You can do so by running the following command: pip install -U transformers==4.30.0
.
Overview
The Trajectory Transformer model was proposed in Offline Reinforcement Learning as One Big Sequence Modeling Problem by Michael Janner, Qiyang Li, Sergey Levine.
The abstract from the paper is the following:
Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components common in offline RL algorithms. We demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL. Further, we show that this approach can be combined with existing model-free algorithms to yield a state-of-the-art planner in sparse-reward, long-horizon tasks.
This model was contributed by CarlCochet. The original code can be found here.
Usage tips
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).
TrajectoryTransformerConfig
class transformers.TrajectoryTransformerConfig
< source >( vocab_size = 100 action_weight = 5 reward_weight = 1 value_weight = 1 block_size = 249 action_dim = 6 observation_dim = 17 transition_dim = 25 n_layer = 4 n_head = 4 n_embd = 128 embd_pdrop = 0.1 attn_pdrop = 0.1 resid_pdrop = 0.1 learning_rate = 0.0006 max_position_embeddings = 512 initializer_range = 0.02 layer_norm_eps = 1e-12 kaiming_initializer_range = 1 use_cache = True pad_token_id = 1 bos_token_id = 50256 eos_token_id = 50256 **kwargs )
Parameters
- vocab_size (
int
, optional, defaults to 100) — Vocabulary size of the TrajectoryTransformer model. Defines the number of different tokens that can be represented by thetrajectories
passed when calling TrajectoryTransformerModel - action_weight (
int
, optional, defaults to 5) — Weight of the action in the loss function - reward_weight (
int
, optional, defaults to 1) — Weight of the reward in the loss function - value_weight (
int
, optional, defaults to 1) — Weight of the value in the loss function - block_size (
int
, optional, defaults to 249) — Size of the blocks in the trajectory transformer. - action_dim (
int
, optional, defaults to 6) — Dimension of the action space. - observation_dim (
int
, optional, defaults to 17) — Dimension of the observation space. - transition_dim (
int
, optional, defaults to 25) — Dimension of the transition space. - n_layer (
int
, optional, defaults to 4) — Number of hidden layers in the Transformer encoder. - n_head (
int
, optional, defaults to 4) — Number of attention heads for each attention layer in the Transformer encoder. - n_embd (
int
, optional, defaults to 128) — Dimensionality of the embeddings and hidden states. - resid_pdrop (
float
, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. - embd_pdrop (
int
, optional, defaults to 0.1) — The dropout ratio for the embeddings. - attn_pdrop (
float
, optional, defaults to 0.1) — The dropout ratio for the attention. - hidden_act (
str
orfunction
, optional, defaults to"gelu"
) — The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu"
,"relu"
,"selu"
and"gelu_new"
are supported. - max_position_embeddings (
int
, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048). - initializer_range (
float
, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - layer_norm_eps (
float
, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers. - kaiming_initializer_range (`float, optional, defaults to 1) — A coefficient scaling the negative slope of the kaiming initializer rectifier for EinLinear layers.
- use_cache (
bool
, optional, defaults toTrue
) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant ifconfig.is_decoder=True
. Example —
This is the configuration class to store the configuration of a TrajectoryTransformerModel. It is used to instantiate an TrajectoryTransformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the TrajectoryTransformer CarlCochet/trajectory-transformer-halfcheetah-medium-v2 architecture.
Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.
>>> from transformers import TrajectoryTransformerConfig, TrajectoryTransformerModel
>>> # Initializing a TrajectoryTransformer CarlCochet/trajectory-transformer-halfcheetah-medium-v2 style configuration
>>> configuration = TrajectoryTransformerConfig()
>>> # Initializing a model (with random weights) from the CarlCochet/trajectory-transformer-halfcheetah-medium-v2 style configuration
>>> model = TrajectoryTransformerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
TrajectoryTransformerModel
class transformers.TrajectoryTransformerModel
< source >( config )
Parameters
- config (TrajectoryTransformerConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The bare TrajectoryTransformer Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
the full GPT language model, with a context size of block_size
forward
< source >( trajectories: Optional = None past_key_values: Optional = None targets: Optional = None attention_mask: Optional = None use_cache: Optional = None output_attentions: Optional = None output_hidden_states: Optional = None return_dict: Optional = None ) → transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
Parameters
- trajectories (
torch.LongTensor
of shape(batch_size, sequence_length)
) — Batch of trajectories, where a trajectory is a sequence of states, actions and rewards. - past_key_values (
Tuple[Tuple[torch.Tensor]]
of lengthconfig.n_layers
, optional) — Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (seepast_key_values
output below). Can be used to speed up sequential decoding. Theinput_ids
which have their past given to this model should not be passed asinput_ids
as they have already been computed. - targets (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) — Desired targets used to compute the loss. - attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- use_cache (
bool
, optional) — If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
). - output_attentions (
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail. - output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail. - return_dict (
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple.
Returns
transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
A transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or a tuple of
torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various
elements depending on the configuration (TrajectoryTransformerConfig) and inputs.
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) — Language modeling loss. - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). - past_key_values (
Tuple[Tuple[torch.Tensor]]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple of lengthconfig.n_layers
, containing tuples of tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
). Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding. - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Hidden-states of the model at the output of each layer plus the initial embedding outputs. - attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. GPT2Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The TrajectoryTransformerModel forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Examples:
>>> from transformers import TrajectoryTransformerModel
>>> import torch
>>> model = TrajectoryTransformerModel.from_pretrained(
... "CarlCochet/trajectory-transformer-halfcheetah-medium-v2"
... )
>>> model.to(device)
>>> model.eval()
>>> observations_dim, action_dim, batch_size = 17, 6, 256
>>> seq_length = observations_dim + action_dim + 1
>>> trajectories = torch.LongTensor([np.random.permutation(self.seq_length) for _ in range(batch_size)]).to(
... device
... )
>>> targets = torch.LongTensor([np.random.permutation(self.seq_length) for _ in range(batch_size)]).to(device)
>>> outputs = model(
... trajectories,
... targets=targets,
... use_cache=True,
... output_attentions=True,
... output_hidden_states=True,
... return_dict=True,
... )