RL Environment Wrappers

class simulate.RLEnv

< source >

( scene: Scene time_step: typing.Optional[float] = 0.03333333333333333 frame_skip: typing.Optional[int] = 4 )

Parameters

scene (Scene) — The Simulate scene to be wrapped.
time_step (float, optional, defaults to 1/30.0) — The physics timestep of the environment.
frame_skip (int, optional, defaults to 4) — The number of times an action is repeated in the backend simulation before the next observation is returned.

The basic RL environment wrapper for Simulate scene following the Gym API.

close

< source >

( )

Close the scene.

get_attr

< source >

( attr_name: str indices: typing.Any = None )

Return a class attribute by name.

reset

< source >

( ) → obs (Dict)

Returns

obs (Dict)

the observation of the environment after reset.

Resets the actors and the scene of the environment.

sample_action

< source >

( ) → action (list[list[list[float]]])

Returns

action (list[list[list[float]]])

Lists of the actions, dimensions are n-maps, n-actors, action-dim.

Samples an action from the actors in the environment. This function loads the configuration of maps and actors to return the correct shape across multiple configurations.

step

< source >

( action: typing.Union[typing.Dict, typing.List, numpy.ndarray] ) → observation (Dict)

Parameters

action (Dict or List) — The action to be taken by the environment.

Returns

observation (Dict)

A dictionary of observations from the environment. reward (float): The reward for the action. done (bool): Whether the episode has ended. info (Dict): A dictionary of additional information.

The step function for the environment, follows the API from OpenAI Gym.

TODO verify, a dict with actuator tags as keys and as values a Tensor of shape (n_show, n_actors, n_actions)

step_async

< source >

( actions: ndarray )

Step the environment asynchronously.

step_recv_async

< source >

( ) → observation (Dict)

Returns

observation (Dict)

A dictionary containing the observation from the environment. reward (float): The reward for the action. done (bool): Whether the episode has ended. info (Dict): A dictionary of additional information.

Receive the response from the environment asynchronously.

step_send_async

< source >

( action: typing.Union[typing.Dict, typing.List, numpy.ndarray] )

Parameters

action (Dict or List or ndarray) — The action to be executed in the environment.

Send action for execution asynchronously.

class simulate.ParallelRLEnv

< source >

( map_fn: typing.Union[typing.Callable, simulate.scene.Scene] n_maps: typing.Optional[int] = 1 n_show: typing.Optional[int] = 1 time_step: typing.Optional[float] = 0.03333333333333333 frame_skip: typing.Optional[int] = 4 **engine_kwargs )

Parameters

map_fn (Callable) — a generator function that returns a RLEnv for generating instances of the desired environment.
n_parallel (int) — the number of executable instances to create.
starting_port (int, optional, defaults to 55001) — initial communication port for spawned executables.

RL environment wrapper for Simulate scene. Uses functionality from the VecEnv in stable baselines 3 For more information on VecEnv, see the source https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html

close

< source >

( )

Close the environment.

env_is_wrapped

< source >

( wrapper_class: typing.Type[gym.core.Wrapper] indices: typing.Union[NoneType, int, typing.Iterable[int]] = None )

Check if the environment is wrapped.

reset

< source >

( ) → obs (Dict)

Returns

obs (Dict)

the observation of the environment after reset.

Resets the actors and the scene of the environment.

sample_action

< source >

( ) → action (list[list[list[float]]])

Returns

action (list[list[list[float]]])

Lists of the actions, dimensions are n-maps, n-actors, action-dim.

Samples an action from the actors in the environment. This function loads the configuration of maps and actors to return the correct shape across multiple configurations.

step

< source >

( action: typing.Union[typing.Dict, typing.List, numpy.ndarray] ) → all_observation (List[Dict])

Parameters

action (Dict or List) — a dict or list of actions for each actuator.

Returns

all_observation (List[Dict])

a list of dict of observations for each sensor. all_reward (List[float]): all the rewards for the current step. all_done (List[bool]): whether each episode is done. all_info (List[Dict]): a list of dict of additional information.

The step function for the environment, follows the API from OpenAI Gym.

TODO verify, a dict with actuator tags as keys and as values a Tensor of shape (n_show, n_actors, n_actions)

step_recv_async

< source >

( ) → obs (Dict)

Returns

obs (Dict)

A dict of observations for each sensor. reward (float): The reward for the current step. done (bool): Whether the episode is done. info (Dict): A dict of additional information.

Receive the response of a step from the environment asynchronously.

step_send_async

< source >

( action: typing.Union[typing.Dict, typing.List, numpy.ndarray] )

Parameters

action (Dict or List or np.ndarray) — A dict or list of actions for each actuator.

Send a step to the environment asynchronously.

class simulate.MultiProcessRLEnv

< source >

( env_fn: typing.Callable n_parallel: int starting_port: int = 55001 )

Parameters

env_fn (Callable) — a generator function that returns a RLEnv / ParallelRLEnv for generating instances of the desired environment.
n_parallel (int) — the number of executable instances to create.
starting_port (int) — initial communication port for spawned executables.

Multi-process RL environment wrapper for Simulate scene. Spawns multiple backend executables to run in parallel, in addition to the optionality of multiple maps. Uses functionality from the VecEnv in stable baselines 3. For more information on VecEnv, see the source https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html

step

< source >

( actions: typing.Union[list, <built-in function array>, NoneType] = None ) → all_observation (Dict)

Parameters

actions (Dict or List) — TODO verify, a dict with actuator tags as keys and as values a Tensor of shape (n_show, n_actors, n_actions)

Returns

all_observation (Dict)

TODO all_reward (float): TODO all_done (bool): TODO all_info: TODO

The step function for the environment, follows the API from OpenAI Gym.