Simulate documentation

Using 🤗 Simulate to learn Agent behaviors with Stable-Baselines3

Join the Hugging Face community

to get started

# Using 🤗 Simulate to learn Agent behaviors with Stable-Baselines3

We provide several example RL integrations with the Stable-Baselines3 (LINK) library. To install this dependancy use pip install simulate[sb3].

Including:

• Learning to navigate in a simple T-Maze
• Collecting objects
• Navigating in procedurally generated mazes
• Physical interaction with movable objects
• Reward functions based on line of sight observation of objects.

## Learning to navigate in a simple T-Maze

Example: sb3_basic_maze.py

Objective: Navigate to a spherical object in a simple T-Maze. Upon object collection, the environment resets.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera.

Observation space:

• An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

• A discrete action space with 3 possible actions
• Turn left 10 degrees
• Turn right 10 degrees
• Move forward

Reward function:

• A dense reward based on improvement in best euclidean distance to the object
• A sparse reward of +1 when the object is collected
• A timeout penaly of -1 if the agent does not reach the object in 200 time-steps

Parallel: 4 independent instances of the same environment configuration.

## Collecting objects

Example: sb3_collectables.py

Objective: Collect all 20 objects in a large square room.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera.

Observation space:

• An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

• A discrete action space with 3 possible actions
• Turn left 10 degrees
• Turn right 10 degrees
• Move forward

Reward function:

• A sparse reward of +1 when an object is collected
• A timeout penaly of -1 if the agent does not reach the object in 500 time-steps

Parallel: 4 independent instances of the same environment configuration.

## Navigating in procedurally generated mazes

Example: sb3_procgen.py

Objective: Navigate to an object in a 3D maze, when the object is collected the environment resets.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

• An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

• A discrete action space with 3 possible actions
• Turn left 10 degrees
• Turn right 10 degrees
• Move forward

Reward function:

• A sparse reward of +1 when the object is reached
• A timeout penaly of -1 if the agent does not reach the object in 500 time-steps

Parallel: 4 independent instances of randomly generated environment configurations.

## Physical interaction with movable objects

Example: sb3_move_boxes.py

Objective: Push boxes in a room near to each other.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

• An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

• A discrete action space with 3 possible actions
• Turn left 10 degrees
• Turn right 10 degrees
• Move forward

Reward function:

• A reward for moving the red and yellow boxes close to eachother
• A reward for moving the green and white boxes close to eachother
• A timeout penaly of -1 if the agent does not reach the object in 100 time-steps

Parallel: 16 independent instances of the same environment configuration.

## Reward functions based on line of sight observation of objects.

Example: sb3_visual_reward.py

Objective: Move the agent so the box is within the agents its field of view

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

• An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

• A discrete action space with 3 possible actions
• Turn left 10 degrees
• Turn right 10 degrees
• Move forward

Reward function:

• A sparse reward for moving the box within a 60 degree fov cone in front of the agent.
• A timeout penaly of -1 if the agent does not reach the object in 100 time-steps

Parallel: 4 independent instances of the same environment configuration.