Simulate documentation

Using πŸ€— Simulate to learn Agent behaviors with Stable-Baselines3

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.1.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Using πŸ€— Simulate to learn Agent behaviors with Stable-Baselines3

We provide several example RL integrations with the Stable-Baselines3 (LINK) library. To install this dependancy use pip install simulate[sb3].

Including:

  • Learning to navigate in a simple T-Maze
  • Collecting objects
  • Navigating in procedurally generated mazes
  • Physical interaction with movable objects
  • Reward functions based on line of sight observation of objects.

Learning to navigate in a simple T-Maze

Example: sb3_basic_maze.py

Objective: Navigate to a spherical object in a simple T-Maze. Upon object collection, the environment resets.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera.

Observation space:

  • An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

  • A discrete action space with 3 possible actions
  • Turn left 10 degrees
  • Turn right 10 degrees
  • Move forward

Reward function:

  • A dense reward based on improvement in best euclidean distance to the object
  • A sparse reward of +1 when the object is collected
  • A timeout penaly of -1 if the agent does not reach the object in 200 time-steps

Parallel: 4 independent instances of the same environment configuration.

Collecting objects

Example: sb3_collectables.py

Objective: Collect all 20 objects in a large square room.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera.

Observation space:

  • An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

  • A discrete action space with 3 possible actions
  • Turn left 10 degrees
  • Turn right 10 degrees
  • Move forward

Reward function:

  • A sparse reward of +1 when an object is collected
  • A timeout penaly of -1 if the agent does not reach the object in 500 time-steps

Parallel: 4 independent instances of the same environment configuration.

Navigating in procedurally generated mazes

Example: sb3_procgen.py

Objective: Navigate to an object in a 3D maze, when the object is collected the environment resets.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

  • An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

  • A discrete action space with 3 possible actions
  • Turn left 10 degrees
  • Turn right 10 degrees
  • Move forward

Reward function:

  • A sparse reward of +1 when the object is reached
  • A timeout penaly of -1 if the agent does not reach the object in 500 time-steps

Parallel: 4 independent instances of randomly generated environment configurations.

Physical interaction with movable objects

Example: sb3_move_boxes.py

Objective: Push boxes in a room near to each other.

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

  • An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

  • A discrete action space with 3 possible actions
  • Turn left 10 degrees
  • Turn right 10 degrees
  • Move forward

Reward function:

  • A reward for moving the red and yellow boxes close to eachother
  • A reward for moving the green and white boxes close to eachother
  • A timeout penaly of -1 if the agent does not reach the object in 100 time-steps

Parallel: 16 independent instances of the same environment configuration.

Reward functions based on line of sight observation of objects.

Example: sb3_visual_reward.py

Objective: Move the agent so the box is within the agents its field of view

Actors: An EgoCentric Camera Actor (LINK) equipped with a monocular camera

Observation space:

  • An RGB camera of shape (3, 40, 40) (C, H, W) in uint8 format.

Action space:

  • A discrete action space with 3 possible actions
  • Turn left 10 degrees
  • Turn right 10 degrees
  • Move forward

Reward function:

  • A sparse reward for moving the box within a 60 degree fov cone in front of the agent.
  • A timeout penaly of -1 if the agent does not reach the object in 100 time-steps

Parallel: 4 independent instances of the same environment configuration.