Deep RL Course documentation

How do Unity ML-Agents work?

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

How do Unity ML-Agents work?

Before training our agent, we need to understand what ML-Agents is and how it works.

What is Unity ML-Agents?

Unity ML-Agents is a toolkit for the game engine Unity that allows us to create environments using Unity or use pre-made environments to train our agents.

It’s developed by Unity Technologies, the developers of Unity, one of the most famous Game Engines used by the creators of Firewatch, Cuphead, and Cities: Skylines.

Firewatch
Firewatch was made with Unity

The six components

With Unity ML-Agents, you have six essential components:

MLAgents
Source: Unity ML-Agents Documentation
  • The first is the Learning Environment, which contains the Unity scene (the environment) and the environment elements (game characters).
  • The second is the Python Low-level API, which contains the low-level Python interface for interacting and manipulating the environment. It’s the API we use to launch the training.
  • Then, we have the External Communicator that connects the Learning Environment (made with C#) with the low level Python API (Python).
  • The Python trainers: the Reinforcement algorithms made with PyTorch (PPO, SAC…).
  • The Gym wrapper: to encapsulate the RL environment in a gym wrapper.
  • The PettingZoo wrapper: PettingZoo is the multi-agents version of the gym wrapper.

Inside the Learning Component

Inside the Learning Component, we have two important elements:

  • The first is the agent component, the actor of the scene. We’ll train the agent by optimizing its policy (which will tell us what action to take in each state). The policy is called the Brain.
  • Finally, there is the Academy. This component orchestrates agents and their decision-making processes. Think of this Academy as a teacher who handles Python API requests.

To better understand its role, let’s remember the RL process. This can be modeled as a loop that works like this:

The RL process
The RL Process: a loop of state, action, reward and next state
Source: Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto

Now, let’s imagine an agent learning to play a platform game. The RL process looks like this:

The RL process
  • Our Agent receives state S0S_0 from the Environment — we receive the first frame of our game (Environment).
  • Based on that stateS0S_0, the Agent takes actionA0A_0 — our Agent will move to the right.
  • The environment goes to a new stateS1S_1 — new frame.
  • The environment gives some rewardR1R_1 to the Agent — we’re not dead (Positive Reward +1).

This RL loop outputs a sequence of state, action, reward and next state. The goal of the agent is to maximize the expected cumulative reward.

The Academy will be the one that will send the order to our Agents and ensure that agents are in sync:

  • Collect Observations
  • Select your action using your policy
  • Take the Action
  • Reset if you reached the max step or if you’re done.
The MLAgents Academy

Now that we understand how ML-Agents works, we’re ready to train our agents.

< > Update on GitHub