Deep RL Course documentation

The Pyramid environment

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

The Pyramid environment

The goal in this environment is to train our agent to get the gold brick on the top of the Pyramid. To do that, it needs to press a button to spawn a Pyramid, navigate to the Pyramid, knock it over, and move to the gold brick at the top.

Pyramids Environment

The reward function

The reward function is:

Pyramids Environment

In terms of code, it looks like this

Pyramids Reward

To train this new agent that seeks that button and then the Pyramid to destroy, we’ll use a combination of two types of rewards:

  • The extrinsic one given by the environment (illustration above).
  • But also an intrinsic one called curiosity. This second will push our agent to be curious, or in other terms, to better explore its environment.

If you want to know more about curiosity, the next section (optional) will explain the basics.

The observation space

In terms of observation, we use 148 raycasts that can each detect objects (switch, bricks, golden brick, and walls.)

We also use a boolean variable indicating the switch state (did we turn on or off the switch to spawn the Pyramid) and a vector that contains the agent’s speed.

Pyramids obs code

The action space

The action space is discrete with four possible actions:

Pyramids Environment < > Update on GitHub